Graph Neural Network GNN

I often saw the content of graph neural network before, but I always found it difficult and did not continue to understand it. Now I have taken the time to study it and briefly understand what GNN is. I have not yet practiced the code. I will continue to update it as I continue to learn. The content of the code is recorded here for easy reference later.

Basic content of the picture

Basic knowledge of graphs

Basic elements of diagrams

A graph is composed of some points and some lines, and can represent the relationship between some entities. The points in the graph are entities, and the lines are the relationships between entities.

Insert image description here

Insert image description here

Insert image description here

To further describe each node, edge or the entire graph, we can store information in each part of the graph

Insert image description here

Each vertex, edge and the entire graph can be represented by a vector. In this example, the vector of the vertex has six values. The height of the cylinder represents the size of the value. Each edge is represented by a length of 8 Represented by a vector, the global representation is represented by a vector with a length of 5

Representation of graph - adjacency matrix

Insert image description here

Represent some content as a picture

Picture representation as picture

Insert image description here

Represent a sentence as a picture

Insert image description here

Molecular structure represented as a diagram

Insert image description here

Representation of social character relationships as a graph

Insert image description here

Representation of graph - adjacency list

There are four kinds of information on the graph: vertex attributes, edge attributes, global information and connectivity (that is, which two vertices are connected by each edge). The first three pieces of information can be represented by vectors. How to represent connectivity?

We can useadjacency matrix to represent it. This matrix will be a square matrix, but there are some problems. This matrix can be very large and sparse, space inefficient, and computationally difficult. In addition, exchanging the order of rows or columns of an adjacency matrix does not change its properties.

For example, the two pictures below are both the character relationship diagrams of "Othello". They look different just because the order of rows and columns is different, but the information they represent is the same. This means that if you design a neural network, no matter how you Whichever of the two pictures below you use, make sure you get the same result.

Insert image description here

The example below shows each adjacency matrix that can describe this small graph of 4 nodes.

Insert image description here

If you want to store the adjacency matrix efficiently and want this order not to affect the results of the neural network, you can use adjacency linked list way to represent the adjacency matrix

For example, as shown below, vertices, edges and global information are all represented by scalars or vectors, and connectivity is represented by adjacency linked list means that the number of adjacency linked lists is consistent with the number of edges, No. i i iThis is the first step i i i Two vertices connected by an edge; this representation is very efficient and will not be affected by the order

Insert image description here

Degree and Neighbors of Graphs

Insert image description here

Structural features, node features, edge features

Insert image description here

Classification of graphs

Insert image description here

Insert image description here

Advantages of graph learning

Insert image description here

Insert image description here

Applications of graph learning

Insert image description here

Node level tasks

In the previous example of the martial arts class, if two teachers break up, predict which teacher the students will choose based on the social graph.

Insert image description here

Financial fraud detection

Insert image description here

Target Detection

Insert image description here

Side level tasks

First, a picture is given, and then the characters are segmented through semantic segmentation, and then the relationship between the characters is predicted. There are already vertices in this graph, which is equivalent to predicting the attributes of the edges.

Insert image description here

Recommended system

Insert image description here

graph level tasks

In this task, the goal is to predict the properties of the graph and identify whether the graph contains two rings; the task is a classification problem

Insert image description here

Odor recognition

Insert image description here

Classification of graph learning algorithms

Insert image description here

Graph Neural Network GNN

What is GNN

A GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances).

GNNs are optimizable transformations of all properties of a graph (nodes, edges, global context) that preserve graph symmetry (permutation invariance).

The GNN here is built using the "Information Transfer Neural Network" framework. The input of GNN is a graph, and the output is also a graph. , it will transform the attributes of your graph (points, edges, global information), but it will not change the connectivity of the graph, that is, which edge connects which vertex, this information will not change.

No matter how complicated things are, the purpose of using graph neural networks is to integrate features

The purpose of graph neural networks

Insert image description here

The simplest GNN layer

Construct a Multi-layer Perceptron (MLP) for the vertex vector, edge vector and global vector respectively, the size of the input and the size of the output are the same.These three MLPs form a GNN layer. The input is a graph and the output is also a graph, and< a i=5>Connectivity unchanged.

Insert image description here

satisfies the first requirement for GNN above, only transforms attributes and does not change the structure of the graph; and MLP It acts on each vector independently, and there is no requirement for the order of samples, so it satisfies the arrangement invariance of the graph and satisfies the second requirement. .

GNN prediction through pooling information

We have built a simple GNN, but how do we make predictions for the task described above?

Consider the simplest case, such as the social graph of the martial arts gym mentioned earlier. Based on that picture to predict which teacher the students will eventually choose, the vertices have been represented by vectors, and a fully connected layer with an output of 2 can be directly added.

Insert image description here

However, if there is no information in the vertices and the information is stored in the edges, we need a method of collecting edge information to Used as vertex prediction. We can accomplish this step bypooling.

Insert image description here

If weonly have edge features, and try to predict binary node information, we can use pooling to route (or pass) the information to Where it needs to go, the model looks like this

Insert image description here

If weonly have node features and try to predict binary edge information, the model is as follows

Insert image description here

If weonly have node-level features, and need topredict two-class global attributes , you need to collect all available node information together and aggregate them. Thisis similar to the global average pooling layer in CNN. The same operation can be done for edges.

Insert image description here

We can summarize the simplest GNN model into the following structure.
A graph is input, and after passing through the GNN layer (essentially three MLPs corresponding to points, edges, and the world), a graph whose attributes have been transformed but the connectivity remains unchanged is output. Connect the layers to get the output.

Insert image description here

In this structure, we did not take advantage of the connectivity between the various parts of the graph. Each part is processed separately, only During pooling, other parts of the information are used. Next, I will tell you how to collect the entire image information.

Passing messages between parts of the graph

We can use pooling in the GNN layer to perceive the connectivity of the graph and make more complex predictions. This can be done by passing messages, whereby neighboring nodes or edges pass information and influence each other's updates.

There are three steps to messaging:

  1. For each node in the graph, collect all neighboring node embeddings (or messages).

  2. Aggregate all messages through an aggregation function (such as sum).

  3. All merged messages are passed through an update function, usually a learned neural network.

Insert image description here

Message passing can occur between vertices or edges, which is key for us to leverage graph connectivity. We will build more detailed variations of message passing in GNN layers to produce GNN models with increased expressiveness and power.

This is reminiscent ofstandard convolution: essentially,messaging and convolution are aggregations and operations that process information about an element's neighbors to update the element's value. In a graph, an element is a node, in an image, an element is a pixel. However, the number of neighboring nodes in a graph can be variable, unlike images where each pixel has a fixed number of neighbors.

By stacking message-passing GNN layers together, nodes can eventually incorporate information from the entire graph

Insert image description here

The red box in the figure represents the aggregation of the vertex information of the 1 nearest neighbor of the vertex, that is, the aggregation of vertex information with a distance of 1 from the vertex.

Learning edge representations

Our datasets do not always contain all types of information (nodes, edges, and global context). When we want to make predictions for nodes, but our dataset only has edge information, we showed above how to use pooling to pass information from edges to nodes, but only for the model Final prediction step. We canshare informationbetween nodes and edges within a GNN layer using message passing.

First, the vertex information connected to the edge is pooled and transferred to the edge. After the edge is updated, the edge information connected to the vertex is pooled and transferred to the vertex, and then updated through MLP. If the lengths of the vertex and edge vectors are different, they need to be projected to the same dimension first and then transferred.

Insert image description here

Which graph properties we update and the order in which we update them is a design decision when building a GNN. We can choose whether to update node embeddings before edge embeddings or vice versa. This is an open research area with multiple solutions - for example, we could do updates in a "weave" fashion, where we have four updated representations that are combined into new node and edge representations:. Node to node (linear), edge to edge (linear), node to edge (edge ​​layer), edge to node (node ​​layer)

Insert image description here

Insert image description here

Add global information

Our previous message passing only considered 1 nearest neighbor point. If a point wants to know information that is far away from it, this is a problem. For this reason, we propose a master node or context vector, which is a virtual point that can be connected to any other point or any edge. This is the global information U. U is global information, which can be connected to any information in the graph.

When aggregating vertex information to edges, U will also be aggregated together. When edge information is transferred to vertices, U will also be transferred together; then after updating edges and vertices, edge and vertex information will be aggregated to U. Then do MLP update.

Insert image description here

In vertex prediction, we can use all or select several of vertex information, adjacent vertex information, connected edge information, and global information to update vertex information and make vertex prediction; these information can be simply added together, or It can come together in other ways.

Insert image description here

The role of multi-layer GNN

The more layers there are, the larger the "receptive field" of GNN is. Each point considers more information about other points, and the more comprehensive the consideration is.

Insert image description here

Graph convolutional neural network GCN

Ordinary convolution VS graph convolution

Insert image description here

semi-supervised learning

GCN is semi-supervised learning (it does not require each node to have a label for training)

When calculating Loss, only nodes with labels need to be considered.

In order to reduce the loss of labeled nodes, the points around them will also be adjusted accordingly. This is also a characteristic of the graph structure. Therefore, in GNN and GCN, training does not require all nodes to have labels (of course, at least one node needs to have labels). Label)

Insert image description here

GCN propagation formula

Softmax is a commonly used activation function for multi-classification.

Insert image description here

Calculation of GCN

The basic composition in the picture

Insert image description here

Feature calculation method

Insert image description here

Transformation of adjacency matrix

Insert image description here

But now there is a problem: the greater the degree of a node, the greater the value after matrix multiplication (the number of accumulations becomes more). This situation is not good (equivalent to the more people a person knows, the greater the value of the matrix multiplication). The larger the eigenvalue of , which is not good)

In order to solve this problem, we need to find the reciprocal of the degree matrix, which is equivalent to the average feeling, and limit the nodes with large degrees.

Insert image description here

Insert image description here
The above left multiplication is equivalent to normalizing the rows, then the columns also need to be normalized.

Insert image description here

But there is another problem. Both rows and columns have been normalized. Wouldn't there be two normalizations (overlapping rows and columns)?

So we need to add a 0.5 power to the reciprocal of the degree matrix to offset the impact of this 2nd normalization.

Insert image description here

Interpretation of GCN transformation principle

As shown in the picture below, assume that the person in the green box is a rich person and the person in the red box is a poor person. They only knew each other when they were children. The poor person only knows rich people, while the rich person knows many people.

If only the rows are normalized, since the poor only know the rich, so their degree is 1, then a large part of the information will come from the rich when reconstructing features. Such a model will most likely think that the poor and Rich people are the same kind of people. Obviously, this is unreasonable

Therefore, we need to normalize both rows and columns at the same time, so that not only the relationship between the rich and the poor is considered, but also the relationship between the poor and the rich.

To put it simply, normalizing the rows takes into account that the rich are very important to the poor; normalizing the columns takes into account that the poor may not be so important to the rich (because the degree of the rich is very large, and the degree of the poor is The degree is very small, and the rich may not remember the poor), which is relatively more reasonable.

Insert image description here

Number of GCN layers

Graph convolution can also be done with multiple layers, but generally it is not done too deep, usually only 2-3 layers (similar to a saying, you only need to know 6 people to know the world)

Insert image description here
Insert image description here

Experiments show that in GCN, deep network structures often do not bring better results.
Intuitive explanation: My cousin knows a friend of a friend of a friend who knows the mayor, which does not mean that I have a good relationship with the mayor.

The more layers there are, the more divergent the feature expression is. Generally, 2-5 layers are sufficient.

Insert image description here

References

A Gentle Introduction to Graph Neural Networks

Zero-based multi-graph detailed explanation of graph neural network (GNN/GCN) [intensive reading of the paper]

It is indeed recognized as the best [Graph Neural Network GNN/GCN Tutorial], from basics to advanced to practical, all in one collection!

Graph Neural Network 7-Day Check-in Camp

Introduction to graph neural network, what is graph neural network, GNN

[Graph Neural Network Practical] Learn the graph neural network GNN in a simple and easy way (Part 1)

Guess you like

Origin blog.csdn.net/qq_41990294/article/details/133514788