Interpretation of two highly cited papers by KG | Two models: Multi-layer Convolutional Neural Network, Knowledge Perception Path Recurrent Network

1. Convolutional 2D Knowledge Graph Embeddings

Author: Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel (University College London)

Paper source: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018)

Click here to get the "paper address"

①Research question

The link prediction task of the knowledge graph is to predict the potential relationship between nodes. Traditional link prediction methods focus on shallow and fast models because they can be extended to large-scale KGs. However, the features learned by the shallow model are much less than those of the deep model, which greatly limits the performance of the model. One of the ways to solve this problem is to increase the dimension of embedding, but it will increase the amount of model parameters, which is inconvenient to extend to large-scale KG. In addition, some existing data sets have test set leakage problems: the triples in the training set can be flipped slightly to get the test set triples, and then the rule-based model can achieve the best performance. The article measures the severity of this problem by constructing a simple flip, and cleans some data to solve the problem.

②Research method

The article proposes a multi-layer convolutional neural network model for the link prediction task of the knowledge graph. Unlike the one-dimensional convolution commonly used in natural language processing, the article can use two-dimensional convolution kernels to extract the relationship between embeddings by stacking multiple vectors into a matrix.

142829.png

The process of the model is summarized as:

  • After look-up embedding, the vector representation of entities and relationships is obtained, and then transformed into a 2D version through deformation and stacking.
  • Use multiple convolution kernels to convolve the stacked matrix to obtain a feature map γ.
  • Vectorize γ, and then map it to k-dimensional space through a fully connected layer.
  • Finally, it is multiplied with the embedding of the target entity to obtain the corresponding score.
  • Perform the sigmoid operation on the score to obtain the probability p, thereby minimizing the cross entropy to train the model.

It is worth mentioning that, unlike the one-to-one model of scoring triple relationships in traditional models, ConvE takes entity-relationship pairs as input and scores all candidate entities at the same time. This way greatly accelerates the calculation speed. Experimental results show that even if the size of the entity is increased by 10 times, the calculation time is only increased by 25%.

③Research results

142830.png

This article conducts experiments on 4 data sets WN18, FB15K, YAGO3-10, and Countries, and compares them with models such as DisMult and R-GCN. Experimental results show that ConvE with 0.23M parameters has similar performance to DistMult with 1.89M parameters. In general, the parameter efficiency of ConvE is more than 17 times that of R-GCN and more than 8 times that of DistMult. In addition, the author also found that ConvE performs better on YAGO3-10 and FB15k-237 than on WN18RR, because the former two contain very large nodes. This complex KG requires a deeper model, while the shallow Models such as DistMult have advantages over simpler KG.

 

2. Explainable Reasoning over Knowledge Graphs for Recommendation

Author: Xiang Wang, DingxianWang, Canran Xu, Xiangnan of He, Cao Yixin, Tat-Seng Chua (National University of Singapore, eBay)

Source of the paper: Proceedings of the AAAI Conference on Artificial Intelligence. 2019 (AAAI'19).

Click here to get the "paper address"

①Research question

In recent years, more and more attention has been paid to how to integrate the knowledge graph into the recommendation system. By exploring the path from the user to the product in the knowledge graph, it can provide rich supplementary information for the interaction between the user and the product. These paths not only reveal the semantics of entities and relationships, but also help understand user interests. However, existing models fail to make full use of paths to infer user preferences, especially in terms of the order dependency of modeling and the overall semantics of paths. The article constructs a Knowledge Aware Path Recurrent Network (KPRN) model , which generates a path representation by combining the semantics of entities and relationships. Using the order dependency in the path, effective reasoning can be made based on the path, thereby inferring the basic principles of the user-item interaction scenario. In addition, the article designs a new weight pooling operation to distinguish the advantages of different paths connecting users and projects, giving our model a certain degree of interpretability. The figure below is an example of a music recommendation scene based on the knowledge graph, the dotted line is the relationship, and the solid line is the user-commodity interaction path.

142831.png

②Research method

Knowledge graph and path: The knowledge graph consists of a set of triples (h, r, t), representing the relationship r between entity h and entity t. The knowledge graph in the article also integrates user-commodity interaction information, that is, contains triples (user, interaction mode, commodity), where the interaction mode is a predefined relationship. The definition of a path is a sequence of entities or relationships, with the user as the starting point and the commodity as the ending point. Given a user, product, and a collection of paths connecting the user and the product, we hope that the model can calculate the possibility of interaction between the user and the product, that is, whether there is a triple (user, interaction method, product).

142832.jpg

The model is divided into three layers: the embedding layer calculates embedding for each behavior of the path. For a given triplet, calculate the embedding of entity name, entity type, and relationship (or interaction method) respectively, and then concatenate to obtain the final feature representation. The LSTM layer inputs the characteristic representation of each unit on the path in chronological order and uses the hidden layer state at the last moment as the characteristic representation of the path. In the pooling layer, the feature representation set of all paths is input to the two-layer feedforward neural network, and then the output is pooled with weights to obtain the final prediction result.

③Research results

The article conducted experiments on the public movie dataset MI and music dataset KKBox to verify the effectiveness of the proposed model. Compared with the method that only maps the entity to a vector representation, KPRN can also mine users and users from the path. The reason for the interaction between commodities, which improves the interpretability of the model.

142833.jpg

As shown in the figure above, a user u4825 is randomly selected in MovieLens-1M, and the movie "Shakespeare in Love" is selected from her interaction record. Then, we extract all the restricted paths connecting the user-item pairs, and get the scores of each path s1 = 0.355, s2 = 0.289, s3 = 0.356, that is, the model is more inclined to think that user u4825 is in love through path 3 and the movie "Shakespeare" has an interactive relationship.

 

Past review:

Interpretation of two NLP highly cited papers | BERT model, SQuAD data set

The TOP100 list of NeurIPS ten-year highly cited scholars is released! These big cows are worthy of worship!

Michael Jordan won the 2020 IEEE Von Neumann Prize and has trained many college students including Bengio

Guess you like

Origin blog.csdn.net/AMiner2006/article/details/103523520
Recommended