[Paper Introduction]-Vertically Federated Graph Neural Network for Privacy-Preserving Node ClassificationVertically Federated Graph Neural Network

Paper information

Insert image description here

Original address: https://www.ijcai.org/proceedings/2022/0272.pdf

Summary

Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i.e., features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmarks and the results demonstrate the effectiveness of VFGNN.

Graph neural networks (GNN) have made significant progress in various practical tasks on graph data. High-performance GNN models always rely on rich features and complete side information in the graph. However, in practice, this information may be isolated by different data holders, which is the so-called data isolation problem. To solve this problem, this paper proposes Vertical Joint Graph Neural Network (VFGNN) , a joint GNN learning paradigm for privacy-preserving node classification tasks when data is vertically partitioned, which can be generalized to existing GNN models. . Specifically, we split the computational graph into two parts. We leave computations related to private data (i.e., features, edges, and labels) to the data holders and delegate the remaining computations to semi-honest servers. We also propose applying differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmark test sets, and the results demonstrate the effectiveness of VFGNN.

Main contributions

  1. A new learning paradigm (VFGNN) is proposed, which not only can be generalized to most existing GNNs, but also has good accuracy and efficiency;
  2. Different composition strategies are proposed for servers to combine local node embeddings from data holders;
  3. The scheme is evaluated on three real-world datasets, and the results demonstrate the effectiveness of VFGNN.

vertically federated GNN (VFGNN)

  • For privacy reasons, calculations related to private data (node ​​features, labels, and edges) are reserved to the data holder;
  • For efficiency reasons, we delegate computations related to non-private data to semi-honest servers.

Divide the calculation graph into the following three sub-calculation graphs:
Insert image description here

Subfigure 1: Private features and edge correlation calculations

Initial node embeddings are generated using private features of nodes, such as user features in social networks. In a vertical data splitting setup, each data holder has local node properties. Then, the data holder generates local node embeddings by aggregating the information of multi-hop neighbors using different aggregator functions.
Insert image description here
Subfigure 2: Calculations related to non-private data

Delegate computations related to non-private data to a semi-honest server to improve efficiency. First, the server combines local node embeddings from data holders with different federation strategies to obtain global node embeddings. Next, the server can perform successive calculations using the clear text data. Delegating these plaintext calculations to the server not only improves our model accuracy, but also significantly improves our model efficiency.

After this, the server gets a final hidden layer and sends it back to the data holder with labels to compute predictions.

Subfigure 3: Calculations related to private tags

The data holder with the label computes the prediction using the final hidden layer received from the server.

Implementation process

  1. Data holders first apply MPC (secure multi-party computation) technology to collaboratively calculate the initial layer of GNN using private node feature information as a feature extraction module, and then use private edge information alone to perform neighborhood aggregation, and finally obtain local node embeddings.
  2. Different combination strategies are proposed for semi-honest servers to combine local node embeddings from data holders and generate global node embeddings, based on which the server can perform continuous non-private data-related computations.
  3. The server returns the final hidden layer to the labeled party, which calculates the prediction and loss. The data holder and server perform forward and backward propagation to complete model training and prediction, during which the private data (i.e., features, edges, and labels) are always kept by the data holder itself.
  4. Differential privacy is employed to exchange information between the server and data holders (e.g., local node embeddings and gradient updates) to further protect potential information leakage from the server.

1. Generate initial node embeddings

Generate initial node embeddings by using node properties. In a vertically partitioned data setting, each data holder has partial node characteristics. There are two methods for data holders to generate initial node embeddings, individually and collaboratively, as shown in the figure below:
Insert image description here

2. Generate local node embeddings

Based on the initial node embeddings, local node embeddings are generated by using multi-hop neighborhood aggregation on the graph. It is important to note that neighborhood aggregation should be performed by data holders individually rather than cooperatively to protect private edge information.
Insert image description here

3. Generate global node embeddings

The server combines the local node embeddings from the data holder and obtains the global node embedding. The combination strategy (COMBINE) should be trainable and maintain high representation ability. Three combination strategies are designed:

  1. Concat
  2. Mean
  3. Regression

4. Use DP to enhance privacy

Data holders directly send local information to the server, such as local node embedding during forward propagation and gradient updates during back propagation, which may lead to potential information leakage. Differential privacy is applied to further enhance privacy.
Two DP-based data release mechanisms are introduced to further enhance the privacy of our proposed VFGNN. In this way, when a single entry is modified in the local information of the data holder, the server will most likely be unable to distinguish the difference before and after the modification. Two mechanisms are proposed, namely Gaussian Mechanism and James-Stein Estimator.

Guess you like

Origin blog.csdn.net/weixin_43598687/article/details/127168855