【Introduction】-Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling

Paper information

Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling
insert image description here

Original link: Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling: https://dl.acm.org/doi/10.1145/3447548.3467371

Summary

Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model – Cross-Node Federated Graph Neural Network (CNFGNN) – which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.

The massive amount of data generated by networks of sensors, wearables, and IoT devices underscores the need for advanced modeling techniques that exploit the spatiotemporal structure of decentralized data because of the need for edge computing and licensing issues. Although federated learning has emerged as a framework for model training without direct data sharing and exchange, effectively modeling complex spatio-temporal dependencies to improve predictive capabilities remains an open problem. On the other hand, state-of-the-art spatiotemporal prediction models assume unrestricted access to data, ignoring the constraints of data sharing. To bridge this gap, we propose a federated spatiotemporal model—Cross-Node Federated Graph Neural Network (CNFGNN)—that uses a graph neural network (GNN)-based architecture to Explicit encoding, which requires data in a network of nodes to be generated locally on each node and remain decentralized. CN-FGNN operates by disentangling modeling of temporal dynamics on the device and spatial dynamics on the server, exploiting alternating optimization to reduce communication cost and facilitate computation on edge devices. Experiments on the traffic flow prediction task show that CNFGNN achieves the best prediction performance in both transductive and inductive learning settings with no additional computational cost on edge devices while incurring moderate communication cost.

main contribution

  1. We propose Cross-Node Federated Graph Neural Network (CN-FGNN), a GNN-based federated learning architecture that can capture complex spatio-temporal relationships among multiple nodes while ensuring that locally generated data remains decentralized without additional The computational cost is on the edge device.
  2. Our modeling and training procedures enable GNN-based architectures to be used in federated learning settings. We achieve this by disentangling modeling of local temporal dynamics on edge devices and spatial dynamics on cen-tral servers, and utilize an optimization-based alternating process to update spatial and temporal modules using split learning and Federated averaging for efficient GNN-based federated learning.
  3. We demonstrate that CNFGNN achieves state-of-the-art prediction performance (in transductive and inductive settings) on edge devices with moderate communication costs, without additional computational cost compared to related techniques in the traffic flow prediction task.

CROSS-NODE FEDERATED GRAPHNEURAL NETWORK

problem definition

Given a dataset of graph G = (V, E), a feature tensor X and label tensor Y, a task is defined on the dataset with X as input and Y as predicted target. We consider learning models under the constraints of cross-node federated learning: node features xi = Xi, ..., node labels yi = Yi, ..., model output visible ionly to node i.
A typical task that requires joint learning of constraints across nodes is to predict spatiotemporal data generated by sensor networks. In this case, V is a collection of sensors and E describes the relationship between sensors. The feature tensor xi ∈ Rm×Drepresents the records of the i-th sensor in the D-dim space in the past m time steps, and the label do ∈ Rn×Drepresents the records of the i-th sensor in the future n time steps. Since records collected on different sensors owned by different users/organizations may not be allowed to be shared due to the need for edge computing or permission issues when accessing the data, it is necessary to design an algorithm that models spatio-temporal relationships without directly exchanging node-level data.

Cross-Node Federated Graph Neu-ral Network (CNFGNN) model

The modeling of node-level temporal dynamics and server-level spatial dynamics is first disentangled as follows:
(i) At each node, an encoder-decoder model extracts temporal features from the data on the node and makes predictions;
insert image description here

(ii) On a central server, GraphNetwork (GN) [6] propagates the extracted node temporal features and outputs node embeddings, which contain informative relations between nodes.
insert image description here

(i) have access to non-shareable node data and execute each node locally. (ii) only involves uploading and downloading shredded features and gradients, not raw data on nodes. This decomposition allows exchanging and aggregating node information under the constraints of cross-node federated learning.

Modeling of Node-Level Temporal Dynamics

We modify a gated recurrent unit (GRU) based encoder-decoder architecture to model node-level temporal dynamics at each node. Given an input sequence at the i-th node insert image description here. The encoder reads the entire sequence sequentially and outputs the hidden state HC,ias a summary of the input sequence according to Equation 1:
insert image description here
where insert image description hereis a zero-valued initial hidden state vector.

To incorporate spatial dynamics into each node's predictive model, we concatenate nodes HC,iwith embeddings insert image description here, which contain spatial information, as initial state vectors for the decoder. The decoder autoregressively generates predictions from the input sequence xi i.
insert image description here
We choose the mean squared error (MSE) between predicted and true values ​​as the loss function, which is evaluated locally at each node.

Modeling of Spatial Dynamics

To capture complex spatial dynamics, we employ GNNs to generate node embeddings that contain all node relationship information. A central server collects hidden states from all nodes {hc,i |  i ∈ V}as input to GNN. Each layer of GNN updates the input features as follows:
insert image description here
where ek, vi, U are edge features, node features and global features, respectively. φe, φv, φuareis a neural network, ρe→v, ρe→u, ρv→uarean aggregate function such as summation.
We set vi = hc,i,ek= Wrk,sk(W is the adjacency matrix) and take the empty vector as input to the first GNN layer. The server-side GNN outputs the embeddings of all nodes {hG,c,i|  i ∈ V}and sends embeddings of each node accordingly.

Alternating Training of Node-Level and Spatial Models

One challenge posed by cross-node federated learning requirements and server-side GNN models is the high communication cost during the training phase. Since we distribute different parts of the model across different devices, split learning is a potential solution for training, where hidden vectors and gradients are communicated across devices. However, when we simply train the model end-to-end by split learning, the central server needs to receive hidden states from all nodes and send node embeddings to all nodes in the forward propagation, and then it has to receive nodes from all nodes The gradient of the embedding and sends the gradient of the hidden state back to all nodes in the backpropagation. Assuming that all hidden states and node embeddings have the same size S, the total amount of data transferred in each training round of the GNN model is 4|V|S.

To reduce the high communication cost in the training phase, we instead alternately train the models on the nodes and the GNN models on the server. More specifically, in each round of training, (1) update the node embeddings hG,c,iand optimize the encoder-decoder model for R rounds, then (2) we optimize the GNN model while fixing all models on the nodes.

Since the model on the node is fixed and hc,iremains unchanged during the training of the GNN model, the server only needs to obtain it from the node before the GNN training starts hc,i, and only needs to communicate the node embedding and gradient. Therefore, the average amount of data transmitted in each round of GNN model R training rounds is reduced to insert image description here. We provide more details of the training process in Algorithm 1 and Algorithm 2.
insert image description here
insert image description here
To extract temporal features from each node more efficiently, an encoder-decoder model is trained on the nodes using the FedAvg algorithm. This enables all nodes to share the same feature extractor and thus a joint temporal feature hidden space, which avoids potential overfitting of the model on nodes and exhibits faster convergence and better predictive performance.

Guess you like

Origin blog.csdn.net/weixin_43598687/article/details/130927694