graph attention mechanism

1 Introduction

In graph neural networks, how do we create an attention mechanism?
For example, in our diagram, there are only 5 nodes. The neighbors of each node must have different effects on its own reconstruction.
It is definitely meaningless if points 2 and 3 contribute equally to the reconstruction of point 1.
Insert image description here
Traditional GNN is the adjacency matrix × feature vector, and then × a trainable parameter weight matrix, to obtain the reconstructed features (feature aggregation).
Insert image description here
The graph attention mechanism takes the importance of features into consideration when operating feature aggregation.
Calculate the weights of the sides No. 1 and No. 2, and the sides No. 1 and No. 3 respectively. For example, e12 is 0.8 and e13 is 0.2. In this way, you know that when reconstructing feature No. 1, you know to pay more attention to point 2 and less to point 3.
In fact, it is the traditional GNN process, plus a weight term. It can be regarded as adding a weight to the "edge".

Insert image description here

2. Find the weight

There are many ways to find weights:

2.1 Inner product + softmax

For example, if you want to calculate the relationship between point 1 and point 2, hi and hj are i=1 and j=2, which represent the characteristics of points 1 and 2. W is a trainable weight parameter that performs a dimensional mapping of features.
Whi and Whj are still two vectors. By doing the inner product of the vectors, the weight value can be obtained.
Insert image description here
Finally, the key is how to calculate this a (attention).
The purpose of softmax is to make the sum of the weights of all edges related to itself be 1.
Insert image description here

2.2 Feature map × trainable weight matrix + softmax

Features h1 and h2 of point No. 1 and No. 2 (assuming the initial dimension is 5), after feature mapping (such as × a 5×8 matrix), Wh1, Wh2. Here, after Wh1 and Wh2 are spliced, it is a 16-dimensional new feature.
At this time, if we do not use the vector inner product, we can add a 16×1 trainable parameter matrix. Finally get the weight value.
Leaky Relu can make the weight as positive as possible; softmax makes the final edge weight sum of all and reconstructed feature points be 1.
Insert image description here

3. The essence of the principle

本质 is actually weighting the "adjacency matrix". Attention is actually a reconstruction of the "adjacency matrix".
For example, the previous adjacency matrix, except itself, and other related neighbor points, are all 1. Is it equivalent to the default that each neighbor contributes equally to its own reconstruction?
Insert image description here
If, after adding the Attention mechanism, the associated neighbors become corresponding weights. For example, 0.8 and 0.2. The network naturally knows which point has the greater contribution.
Insert image description here
The entire GNN process, input feature vectors, and trainable weight parameter matrix remain unchanged. Just transform the adjacency matrix.
Use the calculated weights we mentioned above to multiply the corresponding positions in the adjacency matrix. After using the new weighted adjacency matrix × eigenvector for reconstruction, not every neighbor is 1, and you will know which point contributes more to its own reconstruction.
Insert image description here

4. Calculation process

Point 1 is only related to points 2 and 3. Input features h1, h2, h3. ×A trainable weight parameter matrix for feature mapping. Wh1, Wh2, Wh3. Finally × the weight value obtained by each edge.
Insert image description here

おすすめ

転載: blog.csdn.net/weixin_50557558/article/details/131896677