[Graph Neural Network]Graph Neural Network (GNN)

I. Overview

        1. Figure

Graphs are used                 to represent relationships between entities ( entities are represented as nodes , and relationships are represented as edges ) .

                Relationships are divided into directed and undirected

         2. Graphic representation of data

                Taking an image file as an example, we can use an adjacency matrix to represent an image. Each point represents a pixel point. If a pixel point has x adjacent pixels, then x edges can be used to represent the relationship between this point.

                 Taking the text sequence as an example, an adjacency matrix can also be used to represent this sentence. There is a directed edge connecting two adjacent words .

                 Second, atomic sequences/social networks/knowledge graphs, etc. can all be represented as an adjacency matrix .

        3. Task type

                Tasks at the graph level , tasks at the vertex level , and tasks at the edge level . The basis of classification is that the task needs to learn and predict the attributes of this level (graph, point, edge).

2. Graphs and Neural Networks

        1. The data structure of the graph

                A graph contains: attributes of vertices, attributes of edges, global information, and connectivity . Among them, the attributes of vertices, attributes of edges, and global information can all be expressed as vectors . And connectivity cannot be represented by a general matrix (it needs to be independent of the order of attributes)

                The following figure is an example, which can be expressed as follows

Nodes
[0,1,1,0,0,1,0,0]

                Represents the attributes of each vertex (scalar/vector) 

Edges
[2,1,1,1,2,1,1]

                Represents the attributes of each edge (scalar/vector) 

Adjacency List    #邻接列表
[[1,0],[2,0],[4,3],[6,2],[7,3],[7,4],[7,5]]

                 Length = number of edges; the i-th vector indicates which nodes are connected by the i-th edge.

Global
0

                global information (scalar/vector)

        2. Graph Neural Network (GNN)

                GNN : An optimized change to all attributes (vertices, edges, global) on the graph (this transformation can maintain the symmetric information of the graph <that is, the result will not change after the vertices are sorted by another method>). The input and output through an information passing neural network is a graph (which changes the attributes of vertices and edges, but does not change the connectivity ).

                After outputting the graph, according to different tasks, construct an MLP (Multilayer Perceptron) for the vertex vector, edge vector and global vector respectively

Add a fully connected layer and activation function                 after MLP to get the predicted value (all attributes share a fully connected layer)

                 Pooling : If a vertex needs to be predicted but there is no vertex vector, the vector of all edges connected to the vertex and the global vector can be superimposed to represent the vector of the point.

                        Similarly, the lack of edge vectors or global vectors can be replaced by gathering other vectors.

                 The simplest GNN structure is as follows:

                         The graph is input into the GNN unit, and the vertex vector, edge vector, and global vector are respectively transformed to obtain a graph, and the prediction result is obtained through the fully connected layer and the activation function; if some attributes are missing, they can be replaced by the aggregation of other attributes.

        3. The role of convergence in GNN

                In the simplest GNN, the GNN module does not change the connectivity of the graph in order to make full use of the connectivity of the graph . We can use information passing to improve GNN.

                The specific method is to add the vectors of a certain vertex and its two neighbor vertices together to form a converging vector , then send this converging vector to the MLP, and finally update it to the next layer.

                The converging operation is similar to convolution on a picture . The biggest difference from convolution is that the elements in the receptive field of the convolution kernel are weighted and summed (convolution kernel weight x pixel value), while the converging sum is an unweighted sum . and .

                Pooling can also take the maximum value and average value , similar to MaxPooling and MeanPooking in CNN.

                 The aggregation operation is not one-way . In actual stacking, we can pass information from vertices to edges or pass it back, and then enter the MLP after all are completed. Different order of passing will affect the final result (can be alternated).

                 If the images are not combined tightly enough, the information of a node needs to be transmitted far to reach the target node. We can set a Master Node (virtual node, which is set to be connected to all points on the graph, marked as U_n), and when it is necessary to aggregate the information of the vertices to the edges, U_nthe information of the vertices can be aggregated.

                 ! ! ! After the above processing, GNN can use the vector of the vertex itself for prediction, or use all the information related to the vertex for prediction. Called " Message Passing Based Graph Neural Networks "

3. Sampling and batching of graphs

        When the number of network layers is large, the graph must be sampled when passing the gradient forward (otherwise it will take up too much memory)

        Common sampling methods are as follows:

                ①Random sampling : Randomly select some points and obtain their nearest neighbors; the number of sampling points can be specified 

                ② Random walk : Starting from a certain vertex, walk randomly along the graph; the number of steps to walk can be specified

                ③ Random sampling + random walk

                ④ Width traversal : Take a point and perform a k-step width traversal on its 1, 2, and 3 neighbors

 4. Assumptions

        For CNN, the assumption is the invariance of space transformation ; for RNN, the assumption is the continuity of time ; and for GNN, the assumption is the symmetry of the graph (the order of exchanging the graph does not change the nature of the graph)

None, GNN related technologies

        ①GCN network

                GCN (graph convolutional networks)-->Graph Convolutional Neural Network, that is, a graph neural network with aggregation operations

                Assuming that GCN has k layers, and each layer uses width traversal to retrieve 3 neighbors of the target point, it can be approximated as a k-layer 3x3 convolutional neural network, and the subgraph can be approximated as a feature map.

                 The convolution of GCN on the graph is equivalent to taking out the adjacency matrix for matrix multiplication .

        ②GAN network

               GAN (graph attention networks)--graph attention network, the aggregation operation of GNN is a summation without weight, and sometimes we can also weight the summation to meet the calculation needs . Different from CNN's weighted summation (CNN's weight is related to position), because to satisfy the connection invariance property, GNN's weighted summation must be insensitive to position ; generally speaking, the attention mechanism can be used - the weight depends on The relationship between vectors between two vertices . After Attenion, we do point multiplication of the obtained weights with the vertices and then sum them.

Guess you like

Origin blog.csdn.net/weixin_37878740/article/details/129263853