GNN (Graph Neural Network) Basic Concepts

Functions: node classification and graph classification

Space domain: A model that considers the graph structure in space, that is, considers the geometric relationship between the target node and other nodes ( whether there is a connection or not ).

Model representative: GAT (Graph Attention Networks) graph attention model

A weighted summation of neighboring node features is performed using an attention mechanism . The weights of neighboring node features depend entirely on node features, independent of the graph structure.

( Think 卷积神经网络of the pooling in it as a special average weighting 注意力机制, or the attention mechanism is a general pooling method with a preference for input distribution (pooling method with parameters )

Figure 1: Schematic diagram of graph attention network and update formula

Some explanations for the above formula:

  • Equation (1) embedding of layer l nodes

A linear transformation has been done, and W^((l)) is the trainable parameter of the transformation

  • Equation (2) computes the raw attention scores between pairs of nodes. It first concatenates the z embeddings of two nodes, note that || represents concatenation here; then it does a dot product on the concatenated embedding and a learnable weight vector; finally applies a LeakyReLU activation function. This form of attention mechanism is often called additive attention, which is different from the dot product attention in Transformer.

  • Equation (3) applies a softmax operation to the original attention scores obtained from all incoming edges of a node to obtain attention weights.

  • Formula (4) is similar to the node feature update rule of GCN, and performs weighted summation based on attention for the features of all neighboring nodes.

frequency domain:

Model representative: GCN (Graph Convolutional Network) graph convolutional network

Advantages: saving parameters

Disadvantages: not easy to apply to dynamic graphs

(The weights assigned to different neighbors in the neighborhood of the same order are exactly the same (it is not allowed to assign different weights to different nodes in the neighborhood)

A graph convolution operation consists of a normalized summation of neighbor features :

where N(i) is the set of neighbors whose distance to node i is 1 . We usually add an edge connecting node i to itself so that i itself is included in N(i).

is a normalization constant based on the graph structure;

σ is an activation function ( GCN uses ReLU );

W^((l)) is the weight matrix of node feature transformation, which is shared by all nodes.

Since c_ij is related to the structure of the graph, it is difficult to directly apply the GCN model learned on one graph to another graph.

Common steps:

  1. Processing graph adjacency matrix
  2. Decompose the eigenvalues ​​of the graph adjacency matrix to obtain the eigenvalues,
  3. The core difference (how to collect and accumulate the feature representation of neighbor nodes with a distance of 1 )
  4.  Treat the eigenvector as a constant, and the convolution kernel acts on the eigenvalue

GAT replaces the fixed normalization operation in graph convolution with an attention mechanism, and replaces the original normalization constant with a neighbor node feature aggregation function using attention weights .

Multi-head attention

Similar to the multi-channel in the convolutional neural network, GAT introduces multi-head attention to enrich the ability of the model and stabilize the training process. Each attention head has its own parameters. There are generally two ways how to integrate the output results of multiple attention mechanisms:

K in the above formula is the number of attention heads. The authors recommend using concatenation for intermediate layers and averaging for the last layer.

Guess you like

Origin blog.csdn.net/qq_28838891/article/details/122970087