GNN (graph neural network) study section notes (I see it myself)

 Benchmark datasets and tasks

Spatial-based GNN

From layer i to layer i+1 

(SBC)

The last layer needs to be combined into a representative feature. This is called readout, and all nodes are combined into a feature for classification or regression.

 

NN4G(Neural Network for Graph )

Look at Embedding Matrix before learning this


As before, let's say we have a vocabulary of 10,000 words with a, aaron, orange, zulu, and possibly an unknown word token UNK. What we're going to do is learn an embedding matrix E, which will be a 300 by 10,000 matrix if you have 10,000 in your vocabulary, or 10,001 dimensions plus unknown words. The columns of this matrix represent the different vectors represented by the 10,000 different words in the vocabulary. Assuming that the word number of orange is 6257, which represents the 6257th word in the vocabulary, we use the symbol O_6257 to represent this one-hot vector (this vector is 1 except for the 6527th position, and the rest are 0), obviously it is a 10,000-dimensional column vector that has a 1 in only one position, and its height is equal to the width (number of rows) of the embedding matrix on the left.

Assuming that the embedding matrix E is multiplied by O_6257, then a 300-dimensional vector will be obtained, E is 300×10,000, and O_6257 is 10,000×1, so their product is 300×1, that is, a 300-dimensional vector. And so on until you get all the remaining elements of this vector. The obtained 300-dimensional vector is recorded as e_6257. This symbol is the symbol we use to represent the 300×1 embedding vector, and the word it represents is orange.

Remember our goal is to learn an embedding matrix E. In the following operations, you will randomly initialize the matrix E, and then use the gradient descent method to learn the parameters in this 300×10,000 matrix. Multiplying E by the one-hot vector will get the embedding vector. But when you implement it by hand, it is very inefficient to multiply a large number of matrices and vectors to calculate it, because the one-hot vector is a very high-dimensional vector, and almost all elements are 0, so the matrix-vector phase Multiplication efficiency is too low. So in practice you will use a special function (function) to find a certain column of the matrix E alone, instead of doing it with the usual matrix multiplication, but when drawing the schematic diagram (as shown in the figure above, the matrix E is multiplied by one -hot vector schematic diagram), which is more convenient to write.


 

 The adjacency node aggregates to this new node to add up the adjacent neighbors first, and then adds the original input feature.

 

 

 Add them all up, perform a transform on each and add them up again. Represents the feature of the entire graph

DCNN(Diffusion-Convolution Neural Network)

Add up all the nodes with a distance of 1 to the node 3 and take the average, and then do a weight transform after taking the average

When we got to the second floor

The feature of the first layer is still used 

 DGC(Diffusion Graph Convolution)

 MoNET 

 

 GraphSAGE

 

GAT(Graph Attention Networks)

GIN(Graph Isomorphism Network)

 

Guess you like

Origin blog.csdn.net/jcandzero/article/details/127151514