Neural Network Study Notes 4 - GNN Graph Neural and GCN Graph Convolutional Networks

Series Article Directory

The blog relies on the foreign GNN review
combined with Mushen video



graph neural

What is a graph:
A graph is a data structure that models nodes and relationships between nodes. It is the only non-Euclidean data in machine learning. Graph analysis can be used for node classification, link prediction, and clustering.

The source of GNN:
CNN: CNN can extract a large number of local compact features and combine them into high-order features, but CNN can only operate on Euclidean data. The key of CNN lies in local connection, weight sharing, and multi-layer use;
graph embedding: learning to represent graph nodes, edges or subgraphs on low-dimensional vectors. The idea comes from feature learning and word embedding. The first graph embedding learning method is DeepWalk, which treats nodes as words and walks randomly on the graph, and uses the SkipGram model on them;

Based on the above two ideas, GNN will aggregate information on the graph structure, so it can model the input/output elements and the independence between elements. GNNs can also simultaneously use RNN kernels to model the diffusion process on the graph.

Advantages of GNNs:

  1. Standard neural networks (CNN, RNN) cannot solve the disorder of graph input, because they regard the features of points as specific inputs;
  2. The edge between two points represents independent information. In the standard neural network, this information is regarded as point information, and GNN can propagate through the graph structure instead of treating it as a feature; generally speaking , GNN updates the state of hidden nodes through the sum of weights of neighboring nodes;
  3. Advanced artificial intelligence can only require higher interpretability; standard neural networks can generate synthetic images or documents, but cannot generate graphs; GNNs can generate unstructured data (multiple applications: text classification, neural machine translation, relation extraction, image Classification);

GNNs fall short:

  1. Updating the hidden state of a node is inefficient;
  2. Using the same parameters in iterations, updating node hidden states is sequential;
  3. There are some informational features on the edge that cannot be modeled in the original GNN; how to learn the hidden state of the edge is also a problem;

1. What is a picture?

A graph is composed of an entity vertex (nodes) and relationship edges (edges)
insert image description hereinsert image description hereinsert image description here

1. Global U: includes the number of nodes and the longest path. A global graph, sometimes called a master node, is an imaginary node connected to vertices and edges in an abstract way.
2. Vertex V: contains node identity and number of neighbors.
3. Edge E: contains edge identity and edge weight. Graphs can be specialized by associating directionality with edges (directed, undirected).

insert image description here
Conversion between images, sparse matrices and graphs:
1. image Pixels: pixel position of the image
2. Adjacency Matrix: sparse matrix
3. Graph: undirected graph network

It can be seen that the pixel position information of image Pixels 1-1 and 2-2 can be expressed through a two-dimensional sparse matrix and a three-dimensional graph network, and a pixel can fix its location by determining the coordinate orientation with neighboring pixels.

2. Using graphs in machine learning

1. Expression of adjacency matrix

Machine learning models typically take rectangular or grid-like arrays as input. A graph has up to four types of information that one wishes to use to make predictions: nodes V, edges E, global context U, and edge-to-point connectivity.

Among them, the connectivity of the graph is more complicated, and perhaps the best choice is to use an adjacency matrix to express it, because it is easier to use tensor representation. However, this representation has some disadvantages. The number of nodes in some complex graphs may reach millions, and the margin of each node may vary greatly, which will result in a very sparse adjacency matrix, which is not only inefficient in space, but also wants to process larger sparse matrices through GPU , is still a relatively difficult basic problem.

In addition to this, there is the problem that there are many adjacency matrices that can be encoded differently to express the same connectivity, but this does not guarantee that these different matrices will produce the same results in deep neural networks (that is, They are not permutation invariant). It means that there are multiple encoding methods for the same picture to convert it into different adjacency matrices, but the point-edge connectivity expressed by these adjacency matrices is the same, it is the same picture, and the neural network is iterated in the face of these different matrices The results of learning can vary greatly. Just like the four connected nodes of ABCD in the figure below, there are already 24 adjacency matrix schemes.

insert image description here

For these problems, someone proposed an elegant and memory-saving method - the adjacency list.
The adjacency list contains the number of edges and the connection relationship between nodes. There are as many items as there are edges, and each item records the serial numbers of the two nodes connected by the corresponding edge. It can avoid calculation and storage in the disconnected part of the graph. On the other hand, the order has little effect on the adjacency list, which is both efficient and order-independent.

insert image description here

2、GNN

1. Update alone

GNN adopts a "graph-in-graph-out" architecture. These model types accept graphs as input, load their nodes, edges and global context into the model, and gradually update these incoming information without changing the input connectivity graph, that is, Changes are made only to attributes but not to the structure of the graph.

Say using separate multi-layer perceptrons (MLP) for U, V, E, apply the MLP to each node vector and return a learned node vector. Do the same for each edge and return learned information for each edge. for the global context vector and returns information for learning the entire graph.
insert image description here

Like neural network modules or layers, we can stack these GNN layers together to build a simple GNN.
How to predict the output of the last layer of GNN in the task, such as in the case of binary classification, can be easily extended to multi-class or regression situations.

When the vector information is stored in the node, binary or N classification prediction is performed, and the graph already contains node information, then the method is very simple - for each node update, apply the fully connected layer + softmax to get the output.
insert image description here

When the vector information is stored in the edge, but there is no information in the node and the node needs to be predicted, the information can be collected from the edge E and provided to the node for prediction. This can be done by pooling, applying a fully connected layer + softmax to get the output. Pooling is performed in two steps:

  1. Collect the vector information of each node connecting edge E and the global context U information, and connect them into a matrix.
  2. Then for the aggregated collected vectors, a sum operation is usually performed.
    insert image description here
    insert image description here

There are many other schemes such as predicting edges when vector information is stored in edges, predicting edges when vector information is stored in nodes, predicting global context when vector information is stored in nodes and edges, etc. No matter where the vector information is stored, it can be obtained through aggregation
operations The vector information of the desired attribute.

The following is a simple GNNs model diagram:
insert image description here

  1. Input a GNN graph
  2. Perform corresponding MLP processing on U, V, and E respectively
  3. output attribute update structure invariant graph
  4. Convergence layer + fully connected layer + softmax
  5. output result

2. VE interactive update

But in this simplest model of GNNs, we don't use graph connectivity inside GNN layers at all. Each node is processed independently, as is each edge, as well as the global context. We only use connectivity when pooling information for prediction, and do not make good use of the information of the graph.

Here is the pooling aggregation to achieve this. Before the node V is passed into the MLP model, the corresponding adjacent node is found according to the point-line connection, that is, a neighbor, and they are performed on the aggregation layer to generate a new vector, and then passed Into the MLP, that is, adding a simple aggregation step, similar to the convolution operation without weighted sum, which is also the idea of ​​​​the graph convolutional neural network.
insert image description here
insert image description here

By stacking message-passing GNN layers together, a node can finally integrate and obtain information from the entire graph, such as after three layers, a node has information about nodes three steps away from it, completing the longer distance of the entire graph. information transfer process.
insert image description here

A question arises from point to point, about the information exchange between VEs.

According to the previous idea of ​​point-edge aggregation information, new node and edge representations are combined: node-to-node (linear), edge-to-edge (linear), node-to-edge (edge ​​layer), edge-to-node (node ​​layer), and more Combination methods, different methods have different results:

  1. Vertex to edge - edge update - edge to vertex - point update
  2. From edge to vertex - point update - vertex to edge - edge update
  3. Alternate update, while
    insert image description here

3. UVE interactive update

There is a flaw in the network: when there are two nodes that are far apart in the graph, they may not be able to effectively and quickly transmit information to each other. For a node, if we have k layers, the information will propagate for at most k steps, creating an information lag. If this node at the edge of the network is important, it may have a certain impact. One solution is to enable all nodes to pass information to each other. Unfortunately, this quickly becomes computationally expensive for large graphs (although this approach, called "virtual edges", has been used for small graphs such as molecules).

One solution to this problem is to use a global representation of the graph (U), sometimes called master nodes or context vectors. This global context vector is connected to all other nodes and edges in the network and can act as a bridge between them to pass information.

  1. U is connected to V, and when information converges from a vertex to an edge, U will also pass it in.
  2. U is connected to E, and when information converges from the edge to the vertex, U will also pass it in.
  3. When U self-updates, it actually merges the information of vertices and edges to update.
    insert image description here

3、GCN

GCN is also a neural network layer. Its layer-to-layer propagation method is:
insert image description here
where:

  1. Aˇ=A+I, I is the identity matrix
  2. Dˇ is the degree matrix of Aˇ, the formula is Dˇii=∑jAˇii
  3. H is the feature of each layer, for the input layer, H is X
  4. σ is the non-linear activation function

The graph convolutional neural network uses the information of other nodes to derive the information of the node.
In addition, each node in the graph is changing its state all the time due to the influence of neighbors and further points until the final balance. The closer the relationship, the greater the influence of the neighbors.

The spatial features in graph data have the following characteristics:
1) Node features: each node has its own features; (reflected on points)
2) Structural features: each node in graph data has structural features, that is, nodes and nodes exist A certain connection. (reflected on the side)

In general, graph data must consider both node information and structural information. The graph convolutional neural network can automatically learn not only node features, but also the association information between nodes.


Summarize

Representations are learned for all graph attributes, so we can exploit them during pooling by conditioning information about the attributes we are interested in relative to the rest. For example, for a node, information from neighboring nodes, connecting edges, and global information can be considered. In order for a new node to embed all these possible sources of information, they can simply be concatenated. Furthermore, it is also possible to map them to the same space via a linear map and add them or apply a feature modulation layer, which can be considered as a feature-based attention mechanism.insert image description here

It is recommended to read this foreign blog and try to operate these interactive diagrams, which can effectively improve understanding.

Guess you like

Origin blog.csdn.net/qq_45848817/article/details/127207781