[Intensive reading of the paper sentence by sentence] SimGNN, the standard processing flow of graph network + NTN/PNC to achieve fast graph similarity matching

Original link: SimGNN

手动防爬虫,作者CSDN:总是重复名字我很烦啊,联系邮箱daledeng123@163.com

Advice before reading

There is a GCN foundation. Supporting theoretical materials: hand-torn graph machine learning, graph neural network
At least complete the intensive reading of DeepWalk, and have a certain understanding of node embedding: intensive reading of DeepWalk papers , intensive reading of Node2Vec papers

background knowledge

SVD decomposition

In a large graph, the table stored by a node embedded in the matrix is ​​often very large, which is very unfavorable for subsequent calculations. It happens that in the direction of the NLP recommendation system, a singular value decomposition method (SVD) is usually used to reduce the maintenance cost and computational overhead of large tables. Assuming that there are 10 million users and 1 million products here, if you need to use a table to count each user's preference for each product, then this table is a huge and sparse table of [10 million, 1 million] Table, because it is impossible for every user to have a preference for these 1 million products. If this user likes 34 products, then the remaining 999996 products are all 0. It is very difficult to do user recommendation and information mining on this table.
Therefore, the method adopted by the industry is to change the maintenance of this table into two tables of [10 million, k] [k, 1 million], where k is a hidden relationship value, indicating that there may be a relationship between the user and the product. K influencing factors. Assuming that k=10 here, then only two tables [10 million, 10][10, 1 million] need to be maintained. For the calculation of these two tables, the calculation of these two tables has dropped by several orders of magnitude, and the sparsity has also been well improved. Or to change an example, there are 1 million people in the city who want to buy a house, and there are 10 million suites. Is it too difficult for these 1 million people to go door-to-door? The completion efficiency and quality will be much higher.
Therefore, in SVD, the most important thing is to find an appropriate K value, so that the calculation of large tables becomes the maintenance of small tables.

Paper Intensive Reading

Title and Authors

insert image description here
This is a paper published at the WSDM Data Mining and Information Retrieval Summit in 2019. The title of the paper is SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. It can be seen that the paper has two key points, the first is graph similarity detection, and the second is fast, that is, how to realize it quickly and accurately.

The authors are researchers from UCLA, Purdue University and Zhejiang University. Yunsheng Bai and many well-known domestic universities have jointly published articles on graph similarity. However, the bottleneck of the current technology is still in the generation and processing of large-scale data (there is no unified specification, similar to the early artificial feature engineering, which is like a group of demons dancing wildly).

Abstract

insert image description here
The author first listed some applications of graph similarity, and said that the traditional graph matching method is often compared through some connection structures (distance, neighbors). Now that the graph neural network is developed, there should be other methods for graph similarity.
In the second paragraph, the author said that if you want to match the similarity of the graph, you first need to represent the graph as a vector, that is, graph embedding, and then compare the similarity of the two vectors; secondly, the author believes that you also need to embed the nodes, so that you can Better mining information. So far, the two core things of the full text have been specified: how to represent graphs as vectors, and how to represent points as vectors.

Introduction

insert image description here
First, the author introduces the problem of graph similarity computation. The traditional graph edit distance GED and maximum common subgraph MCS method 1, time-consuming; 2, cannot calculate large graphs.
insert image description here
In order to solve these two problems, some methods have been proposed so far. The first approach, the pruning-validation framework. To put it bluntly, since the big picture cannot be calculated, then find a way to make the picture smaller. Through a series of database indexing techniques and pruning strategies, the total amount of exact graph similarity computation for queries can be reduced to a manageable level. However, the author believes that this method still does not simplify the time complexity. The second method directly reduces the cost of graph similarity calculation. This method is not to find a way to calculate more accurately, but to discover approximate values ​​faster from a mathematical point of view, and then stop iterative calculations. But these methods usually require rather complicated design and implementation based on discrete optimization or combinatorial search. The author believes that the time complexity remains unchanged.
insert image description here
In order to achieve fast calculation, the author proposes an algorithm of SimGNN. Here is how to do the training phase and the prediction phase. The algorithm will be discussed later.
insert image description here

This embedding method should satisfy three conditions:
(1) Represent invariance. For a graph, no matter how the adjacency matrix changes, the expressed vector should be stable;
(2) Induction. For a picture that has not seen the whole picture, it should also be possible to perform vector representation;
(3) Learnability. This is actually a statement of universality, that is, it can be done for any graph problem.

insert image description here
We first design a learnable embedding function that maps each graph into a vector, providing a global summary (graph embedding). Second, we design a pairwise node comparison method to complement graph-level embedding (node ​​embedding) with fine-grained node-level information. The model achieves better generalization on unseen graphs.
insert image description here
This paragraph is actually bragging. Based on graph embedding and node embedding, this model kills other models indiscriminately.

2 Prelimiaries

2.1 Graph Edit Distance(GED)

insert image description here
The author first introduces the practice of GED. GED is actually the number of times to convert the graph into the optimal center alignment.
insert image description here
As an example in the paper, this picture needs to be edited three times. Editing operations here can only be inserting or deleting vertices/edges or relabeling.

2.2 Graph Convolutional Networks(GCN)

insert image description here
For the explanation of this part, you can refer to the machine learning of hand-torn graphs, the GCN part of the graph neural network. I personally feel that the author's 2.1 and 2.2 are too simple to explain clearly.

3 The proposed approach: SimGNN

insert image description here
The author proposed SimGNN. First, he said that it is an end-to-end model, that is, an end-to-end format. In fact, almost all neural networks are end-to-end. This model represents a one-stop service, that is, fully automatic input to output, no need What to do by yourself. As for SimGNN, the author mentioned that it inputs a pair of graphs and then outputs their similarity scores.

3.1 Strategy one: Graph-Level Embedding Interaction

insert image description here
The author here talks about how to do the specific 4 steps. The first step is node embedding; the second step is graph embedding. Note that the author used the expression attention-based here. If you have seen the neural network part of the hand-torn graph, you should know that the attention mechanism in the graph network is essentially a weighted process on the adjacency matrix. So it is not difficult in essence. The third and fourth steps are to input the vector to calculate the similarity score. Therefore, the core is still 1 and 2 steps.
insert image description here
Point embedding is actually GCN.
insert image description here
Figure embedded part. The author mentioned that a graph is composed of a series of points, and the points have been represented as vectors. In this case, the calculation method can be very simple, such as a graph with three nodes ABC. Then the graph vector can be used (Vectora + Vectorb + Vectorc) / 3 (Vector_a+Vector_b+Vector_c)/3Vectora+Vectorb+Vectorc) /3 is calculated. But the author believes that the attention mechanism should be used to make these three points weighted. As for the calculation of the weight, the traditional method is determined by the degree of connection, that is, three nodes, then the middle node should be important. But the author thinks that this statement is not necessarily true. He cannot be important because he has more connections. What if there is a Neptune in the middle. So it is biased to look at it only by structure or degree.
insert image description here
The author first expresses N nodes as D-dimensional vectors respectively, and obtains a table with N rows and D columns, then trains a weight matrix with D rows and D columns, and multiplies this table to the right to get a new table with N rows and D columns , the table is enriched with weights. The calculation of the attention mechanism is completed. in particular:
insert image description here
)

For each node n, the relationship between this point and the global feature needs to be calculated.
Expansion: When doing deep learning calculations, some features are naturally considered to be particularly large, and some features are particularly small. At present, the academic circle still has no reasonable explanation for such a thing, and regards it as a natural law that exists naturally look at.
At this step, we have obtained the point embedding representation through the GCN method. Therefore, the vector of this graph can be obtained by TopKPooling or other means (average pooling, maximum pooling, etc. are all available). The graph embedding vector obtained by pooling here should be contributed more by nodes with large features. In other words, the graph feature vector here is actually a concentrated expression of large features in each dimension. For example, the vector of a node is (0.001, 0.002), The vector of a node is (0.9, 0.8), then the average pooling (0.45, 0.4) is actually mainly contributed by the large characteristic nodes.
Next, calculate the similarity between the embedding vector and the pooling vector for each point. It is also very simple, just calculate the inner product directly. Let’s assume that 99 nodes are all vectors of one node (0.9, 0.8), one node is (0.001, 0.002), and the pooled value is still (0.9, 0.8). At this time, the inner product of that node is close to 0. Very unimportant.
Calculate the inner product, and then pass a weight to represent the weight relationship corresponding to each point.
insert image description here
This picture represents the whole process. First input the adjacency matrix and encoding vector, and get a vector representation of a point through GCN.
GCN P rocess : H k + 1 = σ ( D ~ − 1 / 2 A ~ D ~ − 1 / 2 H k W k ) GCN \ Process : H^{k+1} = \sigma (\widetilde{D} ^{-1/2} \widetilde{A} \widetilde{D}^{-1/2} H^kW^k)GCN ProcessHk+1=s (D 1/2A D 1/2 HkWk )
Then pooling, attention calculation, to get the embedding vector at the graph level.

insert image description here
Now that the vector representations hi and hj of graphs Gi and Gj have been obtained, the author uses a method called NTN to consider graph information from k dimensions.
insert image description here
NTN is very vague in this paper. The first use of NTN was in the NLP recommendation system, and its role at that time was actually the advancement of SVD. But essentially they function very similarly. Therefore, SVD is introduced in the prerequisite knowledge section. Back here, the authors use matrices W and V of learnable parameters to accomplish mediation. Specifically, NTN is divided into three parts.
The first part is CONCAT i − > j hj W ihj T {CONCAT}_{i->j} \ h_jW_ih_j^TCONCATi>j hjWihjT, here is the mining of relevant information in the graph;
the second part is V CONCAT ( hi + hj ) TV \ {CONCAT}(h_i + h_j)^TV CONCAT(hi+hj)T , here is the fusion of two graph features;
the third part is a bias term b.
Therefore, the calculation of NTN can be summarized as:
Y = W 1 X + W 2 X + b Y = W_1X + W_2X+bY=W1X+W2X+b

insert image description here
insert image description here
NTN integrates the information of the graph. Although the graph vector is used to perform operations such as combination and reconstruction, the author said that this method still has limitations. Differences between the two graphs, such as some small structural differences, cannot be reflected. Let’s take an example:
a network of boys-Teacher Wang-girls, the average test score of boys is 10 points, the average score of girls is 90 points and the average score of boys is 90 points, and the tie score of girls is 90 points. In the previous calculation rules, this The two networks are globally identical, but the details are actually completely different. Similarly, in molecular formulas, isomers also have such problems.
The authors believe that this difference is mainly caused by small structures, which cannot be found by global vectors.

3.2 Strategy two: pairwise node comparison

insert image description here
insert image description here

Continuing with the question above, the author uses the second trick in SimGNN. First of all, the author believes that during the GCN embedding process, the output results are not necessarily the same dimension. If Gi is 8 points and Gj is 6 points, then the node embedding table after GCN should be [8, 5][6, 5] In this way, the author uses fake padding to unify this inconsistency. Fill the less matrix with 0, so that [6,5] becomes [8,5].
Then, calculate the similarity (product) operation for each point of the first image and each point of the second image. In this way, a [8,8] matrix can be obtained. Since two rows are filled with 0s, the last two columns of this [8,8] matrix are also 0s.
Then, directly reshape the [8,8] matrix into [1,64], and directly use the histogram statistics for these 64 numbers, and the bins of the histogram can be designed by yourself. Example: After reshaping the result [0.1,0.2,0.3,0.4,0.5,0.6,…,0.64], then draw a histogram, set the bins to 8, 0.1-0.8 is an interval, 0.9-0.16 is an interval By analogy, then count the distribution of these 64 numbers on the histogram, and draw their distribution map. In this way, the value [8,8,8,8,8,8,8,8] of each position is obtained, which is the output result of pairwise node comparison.

Concatenate the results of NTN and PNC. This result combines both graph-level information and point-level information. Putting such a result into an MLP can realize the 2-category problem.

4 Expresses

From here on, the results of running various data sets will follow. If you are bragging, you won’t do intensive reading. You can directly look at the actual code.

Guess you like

Origin blog.csdn.net/D_Ddd0701/article/details/131591723