Pytorch Neural Network Practical Study Notes_45 Defects of Graph Convolution Model + Compensation Scheme

The graph convolutional model adds feature computation between samples to the results of each fully connected network layer. Its stated quality is dependent on the depth characteristics and defects.

1.1 Features and Defects of Fully Connected Networks

Multilayer fully connected neural networks are called universal fitting neural networks. First, use multiple neuron nodes in a single network layer to achieve low-dimensional data fitting, and then synthesize the low-dimensional fitting ability through multi-layer stacking, so as to theoretically realize the feature fitting of arbitrary data.

The two figures on the left side of Figure 10-12 show that the two neuron nodes in the previous layer have divided the data into two categories in their respective Cartesian coordinate systems.

The figure on the right in Figure 10-12 shows that the neurons in the latter layer fuse the results of the two neurons in the previous layer together to achieve the final classification result.

1.1.1 Defect ①: easy to overfit

In theory, if there are enough layers and nodes in a fully connected neural network, it is possible to fit arbitrary data. However, this problem will bring about overfitting of the model. The fully connected neural network will not only fit normal data, but also fit batches in training, noise in samples, and non-main feature attributes in samples. This will make the model only work on the training dataset and not on other datasets similar to the training dataset.

1.1.2 Defect ②: The model is too large and not easy to train

At present, the main method of training the model is reverse chain derivation, which makes it difficult to train the fully connected neural network once it has many layers (generally it can only support less than 6 layers). Even if BN distributed layer-by-layer training is used to ensure the feasibility of multi-layer training, it cannot withstand the computational pressure caused by too many parameters in the model and the computing power demand for the model to run.

1.2 Defects of graph convolution model (a common problem of graph models that rely on fully connected networks)

Graph convolution just performs an additional filter on the fully connected network of each layer according to the convolution kernel with vertex relationship information.

Because in the graph convolution model, the reverse chain derivation method is also used for training, and the support for the depth of the graph convolution model can generally only reach 6 layers.

While the number of layers is limited, the graph convolution model also has the problem of too many parameters and easy overfitting. This problem also exists in the GAT model.

1.3 Methods to make up for the defects of graph convolution model (same as fully connected network)

1.3.1 The number of layers of the graph convolution model is limited

Using BN, distributed layer-by-layer training and other methods

1.3.2 The graph convolution model is prone to overfitting

Dropout, regularization and other methods can be used, and BN also has the function of improving the generalization ability.

1.3.3 Too many parameters

Use convolution operations to replace the fully connected feature calculation part, and use parameter sharing to reduce weights.

1.3.4 Use a better model

In the field of graph neural networks, there are some better models (such as models such as SGC, GfNN, and DG1). They use the characteristics of graphs to further optimize the graph convolution model from the structure, and show better performance while repairing the original defects of the graph convolution model.

1.4 Understand the principle and defects of graph convolution from the perspective of graph structure

The defect of the graph convolution model is that the graph structure data is regarded as matrix data, and the calculation method of deep learning is integrated on the basis of regular matrix data.

The graph convolution method implemented in the DGL library is based on the graph structure (spatial domain) approach. From the perspective of efficiency, this has greater advantages and is more in line with the characteristics of graph computing.

From the point of view based on graph vertex propagation, the process of graph neural network can be understood as: feature aggregation of vertices based on local neighbor information of vertices, that is, the information of each vertex and its surrounding vertices is aggregated together to cover the original vertex.

1.4.1 Calculation process of graph neural network

As shown in the following figure, the calculation process of the target vertex A in the graph neural network is described: For each calculation, the target vertex A performs an aggregation operation (arbitrary depth) on the surrounding vertex features.

 1.4.2 Reasons why the graph convolutional neural network cannot be built with multiple layers

The graph convolutional neural network can be understood as performing a fully connected transformation on the features each time an aggregation operation is performed, and averaging the aggregated results. Too many layers will result in too many aggregations of each vertex to its surrounding neighbors. This practice will lead to more and more similar values ​​of all vertices, eventually converging to the same value, and it is impossible to distinguish the personality characteristics of each vertex.

1.4.3 There is also a situation where the graph attention mechanism cannot be built with multiple layers

The structure of the graph attention mechanism is almost the same as that of the graph convolution, except that a weight ratio is added to the neighbor vertices in the process of vertex aggregation.

Guess you like

Origin blog.csdn.net/qq_39237205/article/details/123903618