Overview of Graph Neural Network Research (GNN)

Due to its advantages in processing non-Euclidean spatial data and complex features, graph neural networks have received widespread attention and are used in scenarios such as recommendation systems, knowledge graphs, and traffic road analysis.

The irregularity of large-scale graph structures, the complexity of node features, and the dependence of training samples put great pressure on the computational efficiency, memory management, and communication overhead of graph neural network models in distributed systems. This article first briefly introduces the message passing mechanism in the graph neural network model, introduces common graph neural network models by category, and analyzes the difficulties and challenges they face in large-scale data training; then, it discusses the graph neural network algorithm model for large-scale data Carry out classification summary and analysis, including sampling algorithms based on nodes, edges and subgraphs; then introduce the related progress of graph neural network programming framework acceleration, mainly including the introduction of mainstream frameworks and classification summary and analysis of optimization technology.

1 Introduction 

Graph structures can describe complex relationships, such as genetic structures, communication networks, traffic routes, social networks, etc. Graph computing mines structural information, but cannot learn node features. Neural networks perform superiorly on Euclidean space data, but cannot be directly applied to non-Euclidean space graph data. Graph neural networkcombines the advantages of graph computing and neural network to process non-Euclidean spatial data and its complex characteristics, and is applied to network link prediction, recommendation systems and transportation Scenarios such as road analysis. In actual applications, the scale of graph data is huge, such as tens of billions of nodes and hundreds of billions of edges, and the storage overhead exceeds 10TB. GNN faces challenges in computational efficiency, memory management, and distributed communication overhead in large-scale data. The challenges faced by the graph neural network model in large-scale data applications can be divided into graph data structure, graph neural network model, data scale, and hardware platform.

(1) Graph data structure. The irregularity, sparsity, and dynamics of the graph data structure, the power-law distribution of the number of node neighbors, and the interdependence between samples pose challenges to efficient memory access and distributed computing systems.

(2) Graph neural network model. High-dimensional representation of nodes is a salient feature of graph neural networks, which improves model capabilities, but increases computational and memory overhead, which is especially challenging in large-scale data. Deep graph neural network models face the problem of neighbor node explosion due to the iterative update mechanism.

(3) Data size. The whole batch training of graph neural network is limited by memory, and batch training increases the difficulty of data division and iterative update.

(4) Hardware structure. The graph neural network model has modeling requirements in terms of graph data structure and complex features, and requires flexible irregular data reading and efficient intensive calculation. CPU and GPU each have their own advantages, but they cannot meet these two requirements at the same time, which increases the difficulty of accelerating large-scale graph neural network models.

In order to improve the scalability of the graph neural network (GNN) model, accelerate the running process and reduce memory overhead, to cope with and alleviate related difficulties and challenges. In terms of application models, researchers have proposed strategies to improve the processing efficiency of graph neural networks for application scenarios such as natural language processing, traffic prediction, and recommendation systems. In terms of algorithm models, to solve the problem of insufficient memory, batch training is used in research, such as GraphSage, FastGCN, etc. In terms of programming frameworks, in order to solve the dependency of training samples, programming frameworks such as DGL and PyG are proposed. In terms of hardware structure, combined with hardware structures such as CPU, GPU, FPGA, etc., optimization strategies or dedicated hardware acceleration structures, such as HyGCN, are proposed. Figure 1:

6bfa2c2749399b0f385e692175e5cfb9.png

Figure 1 GNN overall framework diagram

Table 1 lists the relevant review of this article. Reviews [29-32] focus on the full-graph training mode of graph neural network models and their applications. However, when the number of nodes or edges is huge, the training process is limited by a single GPU memory. To solve this problem, the sampling algorithm supports the transition of graph neural network models from full-graph training to batch training and is applied to large-scale data. The graph neural network programming framework combines the deep learning framework and graph structure features to improve storage utilization and computing efficiency, and promote large-scale data applications. The review [33-34] mainly summarizes the progress of graph neural network programming framework. The review [36-38] focuses on distributed platforms and summarizes and analyzes the progress of distributed GNN in algorithm models, software frameworks and hardware platforms.

Table 1 Summary of graph neural networks

4b4821045e69d4c9ee498f47b3a1e7a5.png

This article investigates, summarizes and analyzes large-scale graph neural networks from two aspects: algorithm model and framework optimization. First, the basic knowledge and typical algorithms of GNN are introduced, and then the graph neural network models with different granularity sampling strategies are summarized, as well as mainstream acceleration frameworks and related technologies. It provides ideas for the collaborative optimization of framework-algorithm in subsequent graph neural network applications in large-scale data applications.

The content of this article is organized as shown in Figure 2:

ba35bdfadec6bd2a608d901f82310724.png

Figure 2 Content organization of this article

2 Graph Neural Network

Graph neural network (GNN) is a neural network model for graph-structured data that combines the advantages of graph computing and neural networks to capture the graph structure and abstract node features. Graph computing models are good at capturing topological structures but cannot handle high-dimensional features. Typical neural networks are suitable for Euclidean space data, such as convolutional neural networks processing grid data, and recurrent neural networks processing sequence information. For complex graph data in non-Euclidean spaces, the modeling process requires new processing mechanisms. The currently popular message propagation model improves node expression capabilities by obtaining high-order neighbor information, including two steps: neighbor aggregation and node update.

This section starts from the message passing mechanism, introduces the aggregation and update operations of the graph neural network model, introduces the graph convolutional neural network, graph attention network, recurrent graph neural network and autoencoder graph neural network by classification, and analyzes its application in large-scale data Challenges in training and summarize the challenges.

2.1 Message passing mechanism

The message passing mechanism based on neural network describes the process of node characteristics propagating in the network, and the propagation results will eventually be iteratively updated in the node representation through neural network operations. The graph neural network representation model is able to capture graph structure information and model complex node features.

2.2 Common models

Graph Convolutional Network (GCN). GCN is a common graph neural network model that achieves neighbor node aggregation through convolution operations. GCN models are divided into two categories: spectral domain-based and spatial domain-based. The spectral domain-based method is based on graph signal analysis and graph theory, and the spatial domain-based method focuses on the direct aggregation of neighbor nodes.

In large-scale data training, GCN faces the problems of insufficient memory and neighbor explosion. Batched training alleviates memory constraints but increases computational and memory consumption. As the number of layers increases, resource consumption increases exponentially.

Graph Attention Network (GAT). GAT is a deep learning model used to process graph-structured data. It introduces an attention mechanism to assign different weights to each node to capture the dependencies between nodes. GAT is efficient and scalable in graph neural networks and is widely used in social networks, recommendation systems, bioinformatics and other fields.

In large-scale data training, both GAT and GCN have problems of insufficient memory and neighbor explosion. GAT uses attention-weighted aggregation, which consumes more computing and storage resources.

Gated Graph Neural Network (GGNN). Recurrent neural networks (RNN) are used to model sequential information such as text, user history, and audio and video. Long short-term memory network (LSTM) and gated recurrent unit (GRU) are two common forms of RNN. The GGNN model is based on GRU and is aimed at the task of outputting state sequences, while the GCN and GAT models take static graphs as input. GGNN takes the time evolution graph as input and captures the evolution characteristics of the graph structure through structures such as forgetting gates and update gates.

In large-scale data training, GGNN needs to load the entire adjacency matrix, which takes up more memory. Training involves a large number of parameters and significant memory challenges. In batch training, graph data irregularities increase redundant calculations.

Graph neural network based on autoencoder (Structural Deep Network Embedding, SDNE). Autoencoders consist of an encoder and a decoder and efficiently learn node representations through unsupervised learning. The SDNN model applies autoencoders to graph-structured data. Like typical autoencoder models, SDNN needs to reduce the reconstruction loss of nodes. In addition, the similarity between nodes is also considered.

SDNE cannot capture high-order correlations between nodes and needs to capture direct correlations between nodes through a loss function. However, in large-scale data training, memory limitations lead to redundant calculations in batch training. Despite negative sampling, the irregularity of graph data still poses challenges.

Table 3 summarizes the challenges based on different model training methods (whole batch training and batch training) in three aspects: graph neural network model, graph data structure and data scale.

Table 3 Challenges of graph neural networks in large-scale data applications

fe06965054ddf41461d030137596bd5c.png

(* indicates the main reason for the relevant challenge)

3 Graph neural network sampling algorithm 

In response to the challenges faced by graph neural networks in large-scale data training, some meaningful algorithm optimization work has been carried out. Most of the work focuses on data optimization, and the most important method is to use sampling algorithms of different granularities to implement batch training. These algorithms can be mainly divided into the following three categories according to the sampling granularity: node-based sampling algorithms, layer-based sampling algorithms and subgraph-based sampling algorithms.

3.1 Node-based sampling algorithm

GraphSage. GraphSage uses node sampling for representation learning and optimization of model parameters. As shown in Figure 3(b), for the target node, a fixed number of neighbor nodes are randomly sampled, an aggregation function is used for feature aggregation, and backpropagation learning is used. New data representation is achieved by optimizing the model, and irregular graph structure data is regularized with the help of random node sampling algorithm to achieve parameter sharing.

PinSage. The PinSage algorithm combines random walk and graph convolution operations for large-scale recommendation systems. Construct a computational graph through node sampling, capture the structural characteristics of the graph, and improve the scalability of the graph convolutional neural network model on large-scale data. A node sampling algorithm based on importance is proposed, as shown in Figure 3(c). It uses a random walk strategy to evaluate the importance of nodes, selects the most important k nodes for each node as sampling nodes, and conducts important operations during the aggregation process. Sexually weighted.

VR-GCN. VR-GCN is a new sampling algorithm that solves the parameter sharing problem of large-scale graph neural networks, ensures convergence through variance reduction, and proves that the sampling scale does not affect local optimal performance. As shown in Figure 3(d), for the target node, VR-GCN only samples two nodes, uses historical activation nodes to reduce variance, and significantly reduces the bias and variance of the estimated gradient. Compared with the case of considering all neighbor nodes, VR-GCN only considers 2 neighbor nodes, which greatly reduces the time complexity and memory overhead of model training.

LGCL. LGCL structures graph data to meet the requirements of convolution operations, and converts irregular graph structure data into Euclidean space through node feature reorganization, making it easy to use CNN algorithms for optimization. However, the reorganization method based on salient features destroys the diversity of node features to a certain extent and aggravates the problem of over-smoothing of node representation. Taking Figure 3(e) as an example, the sampling mean aggregation method causes node feature values ​​to tend to be close to the maximum value of the corresponding feature, and ultimately all node representations tend to be similar, exacerbating the over-smoothing problem of graph convolutional neural networks.

Summary. In view of the limitations of direct push training models in graph neural networks, GraphSage proposed a node-based sampling algorithm to adapt to inductive tasks by randomly sampling first-order and second-order neighbors. PinSage proposes an importance-based sampling algorithm and performs importance weighting in node aggregation. VR-GCN focuses on the convergence of the sampling algorithm and improves the algorithm convergence by reducing the bias and variance of the gradient estimate. LGCL filters and reorganizes features into new nodes for aggregation.

f9cc825251e316877821e337607e74b1.png

Figure 3 Node-based sampling algorithm

3.2 Layer-based sampling algorithm

FastGCN . FastGCN solves the time and memory overhead problems in large-scale data training of graph neural networks by converting graph convolution operations into probability distribution integral form, as shown in Figure 4(a), and using the Monte Carlo method to estimate the integral value. FastGCN uses hierarchical sampling to avoid neighbor node explosion, conducts model training based on the sampling loss function and gradient function, and optimizes performance through importance sampling.

AS-GON. AS-GON is an adaptive hierarchical sampling algorithm that avoids the neighbor node explosion problem in GCN by fixing the number of sampling nodes layer by layer. The lower-layer nodes are sampled based on the upper-layer sampling results, so that the lower-layer sampling neighbor nodes are shared by as many upper-layer nodes as possible. AS-GON also captures second-order similarity through connection hopping, uses connection hopping strategy to obtain second-order neighbor features, and propagates high-order neighbor feature information without additional sampling overhead.

LADIES. LADIES is a new sampling algorithm designed to solve the challenges existing in node-based and layer-based sampling algorithms. As shown in Figure 4(d), this algorithm constructs a bipartite graph based on the upper node and its neighbor nodes, calculates the importance score as the sampling probability, and samples a fixed number of neighbor nodes as the lower node according to the probability. By iteratively building the entire calculation graph, the computational and memory overhead can be effectively reduced.

Summary. PastGCN estimates the integral value through hierarchical sampling to avoid the explosion of neighbor nodes, but there are problems with sparse connections and redundant nodes. AS-GCN ensures convergence through explicit variance reduction and captures second-order correlations. LADIDS constructs two adjacent layers of nodes based on the bipartite graph and performs hierarchical importance sampling to alleviate the problem of neighbor node explosion, but the global node reuse is limited.

f828025c1a8f49f771c5753f76612d48.png

Figure 4 Layer-based sampling algorithm

3.3 Subgraph-based sampling algorithm

Cluster-GCN. Cluster-GCN improves the computational efficiency of GCN batch training through subgraph sampling algorithm. The nodes are divided into c blocks through the cluster-aware partitioning algorithm Metis, and the adjacency matrix is ​​transformed into diagonal matrices A and B. The representation function of GCN is decomposed into different clustering blocks, and the existing problems of missing edges and estimation errors are alleviated by randomly combining the blocks. In batch training, multiple clustering chunks are randomly selected for each batch instead of using a single chunk as training data.

RWT. RWT is a layer-by-layer walking training strategy used to solve the time and space overhead problems of Cluster-GCN in large-scale graph applications. The graph data is divided into batches through the subgraph sampling algorithm, and a graph neural network model is constructed for training in each batch. The sampling strategy comprehensively considers randomness and graph structure connectivity, and adopts a layer-by-layer expansion method to sample and update the subgraph from the neighbor nodes of the current subgraph until the threshold is reached. RIWT is validated on GCN and GAT.

GraphSAINT. GraphSAINT is a graph neural network model based on sampling. By sampling subgraphs first and then building a network model, it eliminates batch training bias and reduces batch variance. First, the sampling probabilities of nodes and edges are estimated, then subgraph sampling is performed in each training batch, and a complete GON model is constructed for training. The bias is eliminated through the normalization method, while the sampling algorithm is optimized with the help of the random walk strategy. Zeng et al. proposed GraphSAINT to improve accuracy through bias elimination and error reduction. They proposed a parallel training framework to improve program parallelism through batch training through subgraph sampling. Inter- and intra-sampler parallelization theoretically results in near-linear speedups. In terms of feature propagation, data partitioning is used to improve cache utilization and reduce communication overhead. Furthermore, they proposed a runtime scheduler that optimizes parallel performance by rearranging the order of operations and adjusting the batching graph.

SHADOW-GNN. SHADOW-GNN is a graph neural network model designed to solve the challenges posed by large-scale data and alleviate the over-smoothing problem. By decoupling the association between the node acceptance area and the depth of the graph neural network, deep network expression capabilities are achieved while avoiding over-smoothing. SHADOWGNN adopts a subgraph sampling strategy to form different subgraphs, and then applies a graph neural network model of arbitrary depth on the subgraphs to obtain node representations.

Summary. Cluster-GCN improves node utilization through node clustering, as shown in Figure 5(c); RWT uses the random walk strategy to expand the subgraph layer by layer, as shown in Figure 5(b) shows; GraphSAINT focuses on reducing estimation bias and variance and improving model performance; SHADOWGNNI63 uses graph sampling strategies to enhance model scalability and alleviate the problem of over-smoothing, as shown in Figure 5(d).

734902f7cba8625c6ace9238eedbfafc.png

Figure 5 Sampling algorithm based on subgraph

Zeng et al. compared the accuracy performance of 4 sampling algorithms on node classification tasks on 5 data sets (Table 4). The comparison results are shown in Table 5. The subgraph-based sampling algorithm performed better on different data sets. , the micro E1 index is higher and the variance is smaller. The node classification accuracy of GraphSage on the data sets Flickr, Reddit, Yelp and Amazon is close to that of the subgraph-based sampling algorithm, but the training time is longer.

Table 4 Data set statistics

b498bf8edbce6b5f649114480d133e90.png

Table 5 Node classification performance comparison

e6ed89ec2bbd1dc2136f607aa19aa01d.jpeg

In response to the challenges existing in large-scale data training, this section summarizes sampling algorithms at different granularities (as shown in Table 6), such as node-level, hierarchical and subgraph-level sampling algorithms. These algorithms alleviate the memory limitation problem existing in large-scale data training to a certain extent, increase the scalability of the model, and improve the convergence of the model through importance sampling, variance reduction, random combination, etc. However, current sampling algorithms are mainly optimized based on static isomorphic graphs, ignoring the complex characteristics such as heterogeneity, dynamics, and power-law distribution of graph data in real applications.

Table 6 Summary of sampling algorithm

ce6b1b5b91e804878ea22eb65a182da5.png

4 Graph neural network framework 

The computing process of graph neural network involves irregular memory access and complex feature calculation, and the traditional framework has poor performance in graph computing. In response to this problem, researchers proposed a programming framework for graph neural networks and explored optimization technologies to lay the foundation for large-scale graph neural network model operation and optimization.

4.1 Graph neural network programming framework

This section will summarize mainstream programming frameworks such as Deep Graph Library, PyTorchGeometric, and Graph-Learn, as shown in Table 7:

Table 7 Graph neural network programming framework2f00cd47c2912c55d42689b40620b861.png

4.2 Framework related optimization technology

The optimization technology related to the graph neural network programming framework is divided into 5 parts according to its optimization aspects, namely data partitioning, task scheduling, parallel execution, memory management and other aspects. The summary is shown in Table 8:

Table 8 Optimization technologies related to graph neural network frameworkcef1432aee244aef71e5c5341f01edaf.png

5 Summary and Outlook

Analysis of common graph neural network models and their challenges. This article first introduces four common graph neural network models: graph convolutional neural network, graph attention neural network, graph recurrent neural network and autoencoder-based graph neural network, and correspondingly analyzes their problems in large-scale data applications. Challenges, and classified and summarized the challenges related to the model layer. Then the related research is summarized and analyzed from two aspects: algorithm model and programming framework.

algorithm model. In view of the challenges brought by large-scale data in graph neural network model training, most of the optimization work focuses on sampling algorithms. According to the sampling granularity, this paper divides the existing work into node-based, layer-based and sub-based algorithms. There are three types of graph sampling algorithms. For each type of sampling algorithm, the relevant main models are introduced and analyzed. Finally, a comprehensive summary and analysis is conducted.

Programming framework. This article first summarizes mainstream programming frameworks such as DGL and PyG. Then it divides existing optimization technologies into five categories: data partitioning, task scheduling, parallel execution, memory management and other aspects. For each category, its optimization objectives are briefly introduced and specific optimization strategies are listed. Finally, a comprehensive summary and analysis is carried out.

Looking ahead. This article summarizes the related progress of graph neural network optimization for large-scale data, covering two aspects: model optimization and framework acceleration. Below, we will look forward to future work from these two aspects, as shown in Figure 6:

89e7c65c7ea3fd2c4c07616b63e1901e.png

Figure 6 Future work prospects

Recommended reading:

My 2022 Internet School Recruitment Sharing

My 2021 summary

A brief discussion on the difference between algorithm positions and development positions

Internet school recruitment R&D salary summary

The current situation of Internet job hunting in 2022, gold 9 silver 10 will soon become bronze 9 iron 10! !

Public account:AI snail car

Stay humble, stay disciplined, keep improving

5f9c4e959109bfddebe411e8946a8d25.jpeg

Send [Snail] to get a copy of "Hand-in-Hand AI Project" (AI Snail Cart)

Send [1222] to get a good leetcode test note

Send [Four Classic Books on AI] to get four classic AI e-books

Guess you like

Origin blog.csdn.net/qq_33431368/article/details/134724937