The transparent principle of graph neural network GNN

1. What is a picture?

The graph represents the relationship (Edge) between some entities (Node points).
V: point, E: edge, U: the whole graph.
Insert image description here
Pictures are generally divided into two types: directional and non-directional pictures.
Insert image description here
In neural networks, the most important thing is how to "reconstruct features"? The same is true in the picture. Points have their own characteristics. To predict what this person likes in the picture, we must not only consider himself, but also consider his friends and who they are connected to. Therefore, the "point" needs to be reconstructed. The characteristics of "edges" also need to be reconstructed.
Ultimately, what to predict? It is to classify/regress "points" and classify/regress "edges"/"graphs".

2. Adjacency matrix?

2.1 Image?

How is the data represented as a graph?
How to represent a picture as a picture?
For example, here: a 224x224 RGB image will be represented as a 3-channel Tensor before being input to the CNN.
From another perspective, it can be regarded as a "picture". Each pixel is a "point", and the relationship between adjacent pixels can be expressed as the relationship between two points. There is an edge in between. It should be emphasized that there is a connection relationship between each pixel and the surrounding pixels?
The schematic diagram below maps each pixel in the picture to each point in the "graph structure".
The middle one is the adjacency matrix (a large sparse matrix that represents the relationship between points, who is connected to whom, and who has no relationship with whom). For example: the pixel point 0-0 is There is a connection relationship with: 1-0, 0-1, 1-1, but the computer does not know it, so it uses the "adjacency matrix", and the corresponding position in the adjacency matrix is ​​marked blue. There are a total of 25 pixels, and the adjacency matrix is ​​expressed as 25x25 (nxn).
Insert image description here
A is usually used to represent the adjacency matrix of the graph. After building a GNN, the adjacency matrix A will also be used as the input of the network together with the feature X of each point, GNN (A, X).

2.2 Text

How to represent text as a picture?
The text is a sequence, each word can be regarded as a vertex, and there is a directed edge between one word and its next word.
Insert image description here

2.3 Others

Another example is this spice molecule diagram. Its atoms are connected together through some forces. Each atom can be regarded as a "graph" vertex. Then connected together, it forms Side.
Insert image description here
Insert image description here
In the karate club, each of the two teachers here will have competed with some classmates, and they are all put together to form a social picture.

Insert image description here
When we write scientific research papers and cite references, the relationship here is actually a directed edge.

Commonly used graph data sets:
Insert image description here

3. Common applications

In the fields of vision and NLP, such as image processing tasks, when the data set is divided into batches and sent to the network, are the dimensions of each image the same? If it does not exist, it means that the first picture is 200x200, and the third picture becomes 224x224. Is it resized to the same size? In the same format, you can achieve good results by directly using "convolution". In NLP, the input of the previous word and the next word is also fixed.
But:
Like in the field of chemistry, can the universal "convolution" of molecular structure still work? Some molecules contain 100 oxygen atoms, and some are composed of 300 carbon atoms.
Like traffic flow prediction, each intersection is regarded as a vertex. The traffic flow situation at each intersection is different, and the "graph" formed by each city is different. In this case, "graph" is used It's a good fit.
Traditional neural networks: CNN, etc. are only suitable for input data whose structure is fixed. And "graph" is suitable for those data that are more "random".
After representing the data into a graph, three types of problems can be defined on the graph:
(1) Graph level
Give a Graphs are classified, such as identifying which one is a closed loop.
Insert image description here

(2) Vertex level
For example, in the karate social graph, two teachers break up and determine whether each student belongs to the point of Teacher A or Teacher B.
Insert image description here

(3) Edge level
Through semantic segmentation, each point is obtained and the attributes between each point are determined
Insert image description here
Insert image description here

4. Actual representation method

When using neural networks on graphs, the biggest problem is how to represent the graph so that it is compatible with the neural network.
There are generally four kinds of information on the graph: graph attributes, edge attributes, some global information, and connectivity (which two points this edge connects). The first three attributes can all be represented by "vectors", and the attributes of each vertex can be represented by vectors. Neural networks are very friendly to vectors.
What about the key connectivity? Use the previous adjacency matrix (with n points, it is an nxn adjacency matrix)?
In practice, imagine that if the adjacency matrix is ​​nxn, this matrix will be very large, with tens of thousands of points. To maintain this matrix, the efficiency must be very low and require a lot of computing resources.
In actual code implementation, it is a 2xn matrix, n represents the number of edges, and 2 represents: [1, 0] in the figure represents the relationship of [souce, target], Indicates from who to whom, from the 1st point to the 0th point.
is stored in this way: 8 vertices, 7 edges, the attribute of each point is a scalar, the length of the adjacency list is the same as the number of edges, the i-th Which two nodes are connected by the i-th edge represented by the item, in terms of storage, only all edges and all attributes are stored, the calculation is more efficient, and the order does not matter.
Insert image description here

5. Message delivery

The points connected by each point here are not the same. How to reconstruct the characteristics of each point? For message passing and reconstruction features, just remember that on the one hand, you must consider yourself, and on the other hand, you must consider the characteristics of your neighbors.
Insert image description here
X6 here is updated, and finally it looks like this. W is the learnable parameter, similar to the fully connected layer of the convolutional neural network. Insert image description hereInsert image description here
Sum, average, etc., choose whichever method is suitable.

6. Graph Neural Network GNN

The most important thing about GNN is to update the best features of each point. Different tasks can be completed based on the best features of each point.

6.1 Multi-layer GNN

GNN is an optimized transformation of all attributes on the graph: vertices, edges, and global context. This transformation can maintain the symmetry information of the graph (after sorting these vertices, the entire result will not change).
Message passing neural network (message passing neural network). The input of GNN is a graph, and the output is also a graph. It will update the vectors of attributes: vertices, edges, and global context. Transformation, but does not change the "graph connectivity". The adjacency matrix also does not change.
The more layers there are, the larger the "receptive field" is and the more complete the global information is. For example, at the beginning, x1 only considers itself and its two directly related neighbors. However, when passing through the first layer of GNN, since x2 is updated by itself and x3 and x4, x1 will take x2 over at this time. x1 updates itself and its two directly related neighbors, and now it is the updated x2. This means that x1 will also briefly take into account x3 and x4 when updating.
Insert image description here
After obtaining the updated final features of each point, adding a maxpooling and connecting it to a full connection, can we perform classification and regression of points, edges, and graphs?
Insert image description here
Insert image description here

6.2CNN and GNN?

Insert image description here
Insert image description here
Graph convolution is different from convolutional neural network. There is no sliding window in graph convolution. Graph convolution is based on message passing, and updates are made based on each point's own and neighbors.
Traditional machine learning cannot solve the problem of inconsistent and irregular input data structures.
Insert image description here
To obtain the characteristics of each point, we do not need to rack our brains ourselves, just leave it to the graph neural network. GCN (features of each point, adjacency matrix A).

6.3 Semi-supervised learning

There are many points in each graph, and it is impossible to obtain the label of every point. But it doesn’t require all labels to train. When calculating loss, use labeled points for calculation. Although some points have no labels, each point in the graph is updated by itself and its surrounding neighbor points, so points without labels can also produce their own effects.
Insert image description here

6.4Basic idea of ​​GCN

Each point has its own actual features. Now we need to update the yellow points to get new features.
The first step: message passing (aggregation), summing and averaging the features of itself and surrounding neighbor points (a simple method). Then through the neural network (using the FC fully connected layer, there will be trainable parameters), and finally get 2 features. (3D features are –> 2D features).
Insert image description here
Like CNN, GCN can also have multiple layers. At present, GCN does not have too many layers. Just like in this picture, if each point wants to reach the farthest point, it only needs 2 layers to achieve it.
After getting the image, it goes through the first image convolution layer, during which every point will be updated. Then connect the activation function, the second layer of graph convolution layer and so on. The final output result is the vector corresponding to each point.
Insert image description here
Detailed:
For the vertex vector U, edge vector V, and global vector E, construct an MLP respectively (the input and output sizes are the same, all The vertices share 1 MLP, and all the edges share 1 MLP...), these 3 MLPs form a GNN layer, which are input into the corresponding MLP and updated accordingly.
The entire properties have been updated, but the structure of the entire graph has not changed. And each MLP will only act on each attribute, it will not consider all connection information.
Insert image description here
The output of the last layer is the predicted value:
Predicting the vertex, such as which teacher each Taekwondo student follows, is actually changing It becomes a binary classification problem. After getting the vector of each vertex, connect a fully connected layer with an output of 2, and add a softmax to get the final output. (Note here that all vertices share the parameters of a fully connected layer)
Insert image description here
Suppose we still want to predict a vertex, but what if we don’t have a vector for this vertex? A pooling is used here (pooling, not much different from CNN)
Assuming that this point does not have its own vector, and you still want to get its vector for prediction, you can connect it to this point Take out the vectors of those edges, agree to take out the global vector, and you will get 5 vectors. Add all 5 vectors, and you will get the vector representing this point (the dimensions must be consistent).
Insert image description here
Insert image description here
Agree, if there is no edge vector, and an edge is connecting two points, you can add the vectors of the two points, or you can add the global vector, and finally get The vector of this edge.
Insert image description here
The same is true for the global situation, adding up all the points and edges.

Final GNN:
First enter a series of GNN layers. There are three MLPs in each layer, corresponding to 3 different attributes. The final output will get a The output of the graph structure (but all attributes inside are updated and transformed). Finally, add appropriate output layers according to which attribute you want to predict. If there is missing information, use the above polling aggregation.
Insert image description here
But in fact, in GNN, we do not use graph information for it. When making changes to attributes, each attribute only enters the MLP by itself, and we do not see the vertices and Which edge is connected or which vertex is connected to does not reasonably update the information of the entire graph into the attributes. As a result, the final result is unlikely to fully represent the information of the entire graph.
It can be said that before inputting the vector of the vertex into the MLP, add it to the vector of the neighbor vertex to obtain a pooling vector. It's a bit similar to the convolution on the picture.
Insert image description here
After completing the transfer from vertex to edge and edge to vertex, perform MLP update.
Insert image description here
Two different sequences will lead to different results.
Insert image description here
Insert image description here
You can also consider adding global information:
Insert image description here

6.5 Basic components in the figure

A: Adjacency matrix, representing how each point in the graph is connected.
D: degree matrix. For example, the point E in the picture is connected to the four surrounding points, and the degree matrix corresponds to 4. Indicates how many points each point is related to.
F: Vector of each point.
Insert image description here
AxX, adjacency matrix x eigenvector X, represents the information of aggregated neighbors.
Insert image description here
For example, if you reconstruct the A vector here, you will find that the points A and E are connected. However, if you reconstruct AxX, you will find that A itself is not taken into account.
So ​​the adjacency matrix here needs to be added to its own I matrix. (When reconstructing features, you must not only consider the neighbors, but also add yourself).
Insert image description here
However, if you directly use the latest adjacency matrix Ax eigenvector It's not that the more points you connect, the bigger you are. It should be reasonable to make an average.
Insert image description here
So, here we take the degree matrix D into consideration (like the adjacency matrix A, we need to consider ourselves fast, diagonal + 1). Finding the inverse matrix of the D matrix feels like averaging (because the diagonal of the degree matrix D is equivalent to the number of adjacent points around each point, which is similar to the weight, and the reciprocal feels like averaging. ).
Insert image description here
The inverse matrix x adjacency matrix of the D matrix is ​​obtained by averaging the new adjacency matrix.
Insert image description here
There is a problem here. The inverse matrix of the left multiplication matrix D is just equivalent to normalizing the row dimension of the adjacency matrix A. That column must also be taken care of. ,right?
Insert image description here
Here, multiplying the right by the inverse matrix of the degree matrix D is equivalent to normalizing the column dimension.
Insert image description here
However, although normalization is done in both row and column dimensions. But in fact, each value is normalized twice. will make the value very small. So the solution: open a root number at the same time.
Insert image description here
The formula of the overall 2-layer GCN is, where the adjacency matrix in the formula is the adjacency matrix that has been left multiplied by the right and the rows and columns have been normalized.
Insert image description here
Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/weixin_50557558/article/details/131739329