Paper reading "2022WWW: Rethinking Graph Convolutional Networks in Knowledge Graph Completion"

Paper link

Introduction to thesis work

KCN is effective in modeling graph structures. GCN - based KGC models usually use an encoder - decoder framework, with GCNs and KGE models acting as encoders and decoders, respectively.

Many GCN - based KGC models fail to outperform state-of-the-art KGE models despite introducing additional computational complexity?

The author found that the graph structure in GCNs did not significantly improve the performance of KGC , on the contrary, the conversion of entity representation brought performance improvement. The LTE-KGE model proposed in this paper brings similar performance improvement to the KGE model while avoiding the heavy computational load in GCN aggregation.

Despite the additional computational complexity introduced by GCNs , GCN -based models do not exhibit significant advantages over the state-of-the-art KGC models. (1 ) Can GCNs really bring performance gains? (2 ) Which factor of GCNs is crucial in KGC ?

 Can GCNs really deliver performance gains ?

The answer is yes. For the KGC task, GCNs do bring performance gains over KGE models. ,

In most cases, GCNs , especially the state-of-the-art CompGCN , significantly improve the performance of KGE models. On FB237 , WGCN+ TransE performs worse than TransE in both original and replicated cases . A similar phenomenon can also be observed with WGCN+ DistMult / ConvE on WN18RR . This shows that not all GCNs can improve the performance of all KGE models.

Whether the graph structure modeling of GCNs is crucial in KGC is less explored, and which factor of GCNs is crucial in KGC is unclear. Therefore, this paper conducts extensive experiments to test the influence of graph structure , neighbor information , self-loop information and linear transformation of relations .

Which factor of GCNs is crucial in KGC ? - Graph structure

1 graph structure

GCNs are known to be effective in modeling graph structures. Therefore, if we break the graph structure, the performance of GCN - based KGC models is expected to drop significantly.

We conduct experiments with random adjacency tensors to explore the effect of graph structure. Specifically, when constructing an adjacency tensor for message passing, given a valid triple, we replace the tail entity with a random entity in the knowledge graph.

Note that only random adjacency tensors are used in message passing, while train / validation / test triplets remain unchanged.

Surprisingly, randomly corrupting the adjacency tensor, i.e., the graph structure, does not affect the overall performance of the GCN - based KGC model on the two datasets. Models with random adjacency tensors obtain comparable performance corresponding to their normal adjacency tensors. For WGCN+TransE , the random graph structure even improves the performance on FB237 .

The results show that while the GCN encoder can improve the performance of the KGE model, the graph structure modeling in GCNs is not critical to the performance improvement .

Which factor of GCNs is crucial in KGC ? -Neighborhood information

2 neighbor information

To further explore the relationship between graph structure modeling and performance improvement in GCNs , we conduct experiments without using neighbor information during aggregation. That is, the graphs used in GCNs have no edges ( relationships ) between nodes ( entities ) , and new representations of entities are only generated based on previous representations of entities.

Table 4 shows that the model without neighbor information (X+WNI) performs comparable to the original model on both datasets. This shows that the performance gain does not come from neighborhood aggregation .

Which factor of GCNs is crucial in KGC ?-Self -loop information

3 self-loop information

To determine whether self-loop information is required for performance gain, we conduct experiments without self-loop information. That is, the representation of an entity is generated only based on the representations of its neighbor entities and relations.

Table 5 shows the results without self-loop information (X+WSI) . Surprisingly, omitting self-loop information also has no significant impact on the performance of most models .

In most cases, only aggregating neighbor information can achieve comparable results with fully GCN - based KGC models. Further, we randomly break the adjacency tensor while omitting self-loop information. Since only neighbor information is used this way, random adjacency tensors are expected to degrade performance significantly.

As shown in Figure 5 (X+WSI+RAT) , the performance of most decoders is only slightly affected. That is, only aggregating randomly generated neighbor information can achieve comparable results with the fully GCN -based KGC model.

Which factor of GCNs is crucial in KGC ? -Thinking

So far, we have known that the following operations have no significant impact on the performance of GCN - based KGC models on FB237 :

1) Only use self-loop information ; 2) Only use neighborhood information ; 3) Only use randomly generated neighbor information.

These three cases share a common property : they are able to distinguish entities with different semantics with high confidence.

Specifically, if we only use self-loop information, the representation of each entity is independent and thus distinguishable.

If we only use neighbor information, two entities will have similar representations only if they have similar neighbor representations, which is consistent with the assumption of KGC : entities with similar neighbors have similar semantics. Therefore, we can distinguish entities with different semantics.

When randomly sampling neighboring entities from all entities, we assign different neighbors to different entities with high probability, so that we can distinguish different entities by aggregated entity representations.

 Which factor of GCNs is crucial in KGC ?-Linear transformation of relation

Different from RGCN and WGCN , CompGCN applies a linear transformation to relational embeddings. We conduct ablation experiments to explore the effect of transformation.

Table 6 shows that removing the linear transformation of the relation, except for CompGCN+TransE on WN18RR , has no significant impact on performance. Note that TransE is sensitive to hyperparameters, and we did not use grid search to find the best hyperparameters. The performance of CompGCN+TransE may be underestimated.

Therefore, we can conclude that the linear transformation of the relation is not important for the GCN -based KGC model.

The embedding update process of the GCN- based KGC model has three main parts

1) Aggregation based on graph structure 2) Entity conversion 3) Relationship conversion

We have shown that graph-structure-based aggregation and relational transformation are not important for GCN - based KGC models. Therefore, the transformation of aggregated entity representation is crucial for performance improvement .

 A simple yet effective framework for LTE-KGE

Based on the above observations, the authors propose a simple but effective KGC framework, LTE-KGE , which uses linearly transformed entity representations to achieve similar performance to GCN - based KGC models.

The goal of this paper is not to propose a new state-of-the-art KGC model. Rather, we want to demonstrate that simpler models can achieve similar performance to state-of-the-art GCN -based models, and that existing complex GCNs may be unnecessary for KGC .

Wh and Wt are linear transformations with trainable weights.

gh and gt , which can be a combination of functions from the function set {identity function, non-linear activation function, batch normalization, dropout} . These operations correspond to possible nonlinear transformations in GCN- based models.

Note :

a) The linear transformations Wh and Wt can share the same parameters according to the experimental results ;

b) gh and gt can be a combination of different functions according to the experimental results ;

c) Since each entity has its own representation, LTE-KGE can distinguish entities with different semantics ;

d) When wh, wt are unit matrices and gh, gt are unit functions, the LTE-KGE model is restored to the KGE model.

 Experimental results

The authors conducted experiments on DistMult , TransE and ConvE . Specifically, Wh and Wt are the same, while gh and gt are the combination of batch normalization and dropout of DistMult / ConvE , and the identity function of TransE .

Overall, LTE-KGE significantly improves the performance of DistMult and ConvE .

Although LTE-KGE does not explicitly model local graph structures like GCNs , its performance is comparable to, and sometimes even better than, GCN - based KGC models.

The authors also use RotatE and TuckER as baselines. The results show that GCN -based models do not consistently exhibit greater advantages over these KGE models.

As mentioned before, the authors propose LTE-KGE to challenge GCN , not achieve state-of-the-art performance. Since popular GCN -based models do not use RotatE / TuckER as decoders, we also do not build LTE-KGE on top of them .

Guess you like

Origin blog.csdn.net/cjw838982809/article/details/131850761