Interpretation of the latest authoritative review paper on knowledge graph: knowledge representation learning part

In the previous issue, we briefly introduced the opening part of the latest authoritative review paper "A Survey on Knowledge Graphs: Representation, Acquisition and Applications" in 2020. In this issue, we will study the knowledge representation learning part of this paper together.

Paper address:
https://arxiv.org/pdf/2002.00388.pdf​arxiv.org

Review of the previous issue:
Interpretation of the latest authoritative review paper on knowledge graphs: the opening part

Knowledge Graph Representation Learning

Knowledge graph representation learning plays an important role in knowledge acquisition and downstream applications. The representation spaces of knowledge representation learning include: point-wise space, manifold space, complex number space, Gaussian distribution and discrete space. Scoring functions are usually divided into distance-based scoring and semantic matching-based scoring functions. Encoded models include: linear/bilinear models, tensor decompositions, and neural networks. Auxiliary information considers textual, visual and type information.

1 for space

1.1 Point-wise space

Point-wise Euclidean space is the most commonly used. It embeds entities and relationships in knowledge graphs into vector or matrix spaces, and some capture relationship interactions. Point-wise methods include the translation-based method TransE, which needs to satisfy the translation invariant properties h + r = t, TransR, TransH, and the semantic matching methods NTN, HolE, and ANALOGY.
insert image description here

1.2 Complex vector spaces

Extending from the real number space to the complex number space can obtain more one-dimensional representations of entities and relationships, and have richer representation capabilities. ComplEx is the first to extend knowledge graph representation learning to the complex number space, where multiple relations are combined using the Hamiltonian product method, which can model symmetric and antisymmetric relations. RotatE utilizes Hamiltonian multiplication to treat relations as rotation operations from the head entity to the tail entity. QuatE extends the complex number space to the quaternion space, which contains three imaginary parts and one real part, and combines head entities and relations through quaternion multiplication such as Hamiltonian multiplication.
insert image description here

​1.3 Gaussian distribution

Inspired by the Gaussian word embedding method, the KG2E model embeds entities and relations into a multidimensional Gaussian distribution, the mean vector represents the location of entities and relations, and the variance matrix models uncertainty. TransG represents entities with a Gaussian distribution while giving a mixture of Gaussian distributions for relation embeddings.
insert image description here

1.4 Manifolds and groups

A manifold is a topological space that can be defined in set theory as a set of points with a neighborhood, and the set is an algebraic structure defined in abstract algebra. The previous point-wise modeling is an ill-posed algebraic system, while ManifoldE extends point-wise embedding to manifold-based embedding, and introduces two settings of manifold-based embedding, namely spherical and hyperplane.
insert image description here

The sphere-based method needs to map the entities and relations from the original space to the Hilbert space in the form of the following formula:
insert image description here
​TorusE learns the embedding representation by embedding entities and relations in an n-dimensional torus space of a compact Lie group, and according to TransE's translation idea h + r = t.

2 scoring function

The scoring function is used to measure the reliability of a triplet, sometimes called the energy function, which is the basis of the energy-based learning framework. The goal of energy-based learning is to ensure that the score of positive samples is higher than that of negative samples (in TransE, the score of negative samples is higher than that of positive samples, which I think mainly depends on how to define the scoring function). Scoring functions are usually divided into two cases: scoring functions based on distance and scoring functions based on semantic matching.

(1) The distance-based scoring function measures the reliability of the distance between two entities. Here, the relationship is used as an additive translation operation, similar to h + r = t defined by TransE.
insert image description here
​(2) The scoring function based on semantic matching is calculated through the multiplicative calculation of the head entity vector and the relationship matrix, so that the combination of the head entity and the relationship is transferred to the vicinity of the tail entity h Mr = t​
insert image description here

2.1 Scoring function based on distance

SE模型使用两个映射矩阵和L1范数来学习实体和关系的嵌入表示:​
insert image description here
在这之后,就进化为我们耳熟能详的基于翻译思想的TransE模型了:​
insert image description here
自此,TtransE模型的变种和扩展版本就被大量提出,比如将实体表示投影到关系空间的TransR模型:​
insert image description here
通过构建动态映射矩阵的TransD模型:
insert image description here
​为了实现自适应度量学习,用马氏距离来代替欧式距离的TransA模型:
insert image description here
​除了前面完全基于加性算子的打分函数,TransF将完全基于平移的操作扩展到了点乘操作:
insert image description here
​此外,同样基于翻译的思想,KG2E采用高斯空间,并基于两种方式来设计打分函数:
(1)非对称的KL散度:​
insert image description here
(2)对称的期望似然:​
insert image description here
采用流形空间的ManifoldE:
insert image description here

​2.2 Semantic matching model:

Another idea for designing a scoring function is to calculate semantic similarity. SME calculates the degree of semantic matching between entity-relationship pairs (h, r) and (t, r): ​DisMult proposes
insert image description here
a simplified bilinear model:
insert image description here
insert image description here
HolE introduces a circular related operation of embedded representation, which can be interpreted as a compressed tensor product, and its scoring function is defined as:
insert image description here
insert image description here

ANALOGY models the analogous structure of relational data:
insert image description here
​Here , the relational matrix is ​​constrained to be normal.

In addition to learning the vector representation of entities and relationships, CrossE also learns an interaction matrix C, which is related to relationships, and is used to generate vector representations of entities and relationships after interaction:
insert image description here

​3 Coding Model

Interactions between entities and relations can be encoded through specific model structures, including linear/bilinear models, tensor decomposition models, and neural network models.

3.1 Linear/bilinear models

Linear/Bilinear models use linear operations to encode interactions between entities and relations:
insert image description here
​Or bilinear operations:
insert image description here
​Such models include SE, SME, DistMult, ComplEx, and ANALOGY. Interestingly, for the TransE model using the L2 norm, it can also be extended to a linear transformation form: insert image description here
To solve the problem of independent embedding of entity vectors in normalized Polyadia decomposition, SimplE introduces the inverse of the relationship:
insert image description here

3.2 Tensor decomposition model

The basic idea of ​​the tensor decomposition model is to decompose the matrix of each slice in a third-order tensor into the product of two entity vectors and relation matrices in a low-dimensional space. In the RESCAL model, for a knowledge map with a total of m relationships, the slices corresponding to the k-th relationship that represent the relationship between all entities of this relationship can be decomposed as:​
insert image description here

LFM decomposes RESCAL through a bilinear structure
insert image description here:

3.3 Neural Network Model

The neural network can also be used to implement the interaction between linear/bilinear encoding entities and relationships. Entities and relationships can be input into the deep neural network to output a semantic matching score. MLP sends entities and relations together into a fully connected layer, and uses the sigmoid activation function in the second layer to score a triplet: The NTN insert image description here
insert image description here
model takes the embedded representation of the entity and the relationship tensor as input, and outputs the score for the triplet:
insert image description here

3.4 Convolutional Neural Networks

As we all know, CNN has a strong feature extraction ability in images. In order to use CNN in knowledge map representation learning, the ConvE model first uses 2D convolution to reshape the head entity and relationship into a 2-dimensional matrix, and then it can be used to model the interaction between entities and relationships.
insert image description here

​Here, represents the convolution kernel, and vec is a vectorization operation that can reshape a tensor into a vector.
insert image description here

ConvKB directly stitches the embedded representations of head-tail entities and relationships, and then uses CNN:
insert image description here

​Comparing ConvE and ConvKB, ConvE captures the characteristics of local relationships. What I understand is that the convolution kernel only operates on the embedding matrix of the head entity and the relationship. Here, the tail entity is not considered. ConvKB combines the head and tail entities and relationships in the triplet through splicing operations. It has stronger feature learning capabilities, while maintaining certain translation characteristics, and achieved better experimental results.

3.5 Recurrent Neural Networks

The above MLP-based and CNN-based methods only encode triples without considering long-term relationship dependencies. For example, for the relationship path in the knowledge graph, RNN is required for modeling. RSN designs a loop-jumping mechanism to enhance semantic representation by distinguishing entities and relations. For a relationship path generated by random walk in the knowledge map, use RNN to get the hidden layer state:​
insert image description here

The so-called jump operation is:
insert image description here
insert image description here

​​3.6 Transformer

Transformer has become a hot technology in NLP, especially in natural language pre-training represented by Bert. Transformer-based representation learning can integrate contextual information in knowledge graphs. CoKE uses Transformer to encode edge and path sequences. Inspired by the pre-trained language model, KG-BERT uses Transformer to encode entities and relationships.

3.7 Graph neural network

GNN can learn connection structure information under an encoder-decoder framework. RGCN proposes relation-specific transformations to model the directedness of knowledge graphs. The state of aggregating the neighborhood information of an entity obtained at layer l+1 is:

insert image description here

​Here, GCN is an encoder, and different encoders can be selected for different tasks to be integrated into the framework of RGCN. Since RGCN treats the neighborhood information of each entity equally,
insert image description here
​SACN designs a GCN with weights, which defines the connection strength of two adjacent nodes under the same relationship type. The decoder module usually uses ConvE as a measure of semantic matching. Using C convolution kernels, the convolution output of entity and relationship embedding representations can be obtained.
insert image description here
Theninsert image description here

The KBAT model embeddings representations by splicing entities and relations, and then employs a graph attention network and a multi-head attention mechanism to encode multi-hop neighborhood information.

4 Combining auxiliary information

In order to further improve the performance of representation learning, multi-modal embedding can be achieved by combining some external auxiliary information, including text description, type constraints, relationship paths, visual information, logic rules, etc., and the knowledge map itself.

4.1 Text description

The entity description information in the knowledge graph can supplement the semantic information. The challenge of knowledge representation learning combined with textual descriptions is how to embed structured knowledge graphs and unstructured textual descriptions in the same representation space. One approach is to use an alignment model to align entity space and word space by introducing sets of entity names and Wikipedia. DKRL extends TransE to learn representations for entity descriptions using CNNs. SSP maps triplets and textual descriptions into a semantic space, and composes an overall loss with a loss on embeddings and a loss on topics.

4.2 Type information

Entities in knowledge graphs sometimes come with hierarchical category or type confidences. The SSE model combines entity types to embed entities belonging to the same type. TKRL captures hierarchical type information through a mapping matrix about entity types. KR-EAR divides relation types into attributes and relations, and models the associations between entity descriptions.

4.3 Visual Information

Visual information such as pictures of entities can also be used to enhance knowledge representation learning. IKRL encodes images into entity space, ensures triple-based structured representations and image-based representations are in the same representation space, and learns embedding representations of entities and relations following the principle of translation invariance.

5 summary

The most popular representation spaces are based on Euclidean spaces, by embedding entities into vector spaces and modeling interactions between entities via vectors, matrices, or tensors. Other representation spaces include complex vector spaces, Gaussian distributions, and manifold spaces and groups. The advantage that manifold spaces have over pointwise Euclidean spaces is that pointwise embeddings can be relaxed. Gaussian embeddings are able to represent the uncertainty of entities and relations as well as the semantics of multiple relations. Embedding complex vector spaces can effectively model different relational connectivity patterns, especially symmetric/antisymmetric patterns. Representation spaces play an important role in encoding the semantic information of entities and capturing relational properties. When designing a representation learning model, an appropriate representation space should be carefully selected and designed to match the nature of the encoding method and balance representational power and computational complexity. Scoring functions for distance-based metrics use translation principles, while semantic matching scoring functions use composition operators. Encoding models (especially neural networks) play a crucial role in modeling the interaction of entities and relations. Bilinear models have also attracted a lot of attention, and some tensor decomposition models can also be considered as such methods. Other methods contain auxiliary information of textual descriptions, relation/entity types and entity images.

The most cutting-edge knowledge representation learning models in recent years are shown in the following table:
insert image description here
Later , we will also interpret the knowledge acquisition and knowledge application parts of this review.

Welcome to the WeChat public account " Artificial Intelligence Meets Knowledge Graph ", and the Zhihu column "Artificial Intelligence Meets Knowledge Graph". Let us learn and discuss artificial intelligence and knowledge graph technology together.
insert image description here

Guess you like

Origin blog.csdn.net/ngl567/article/details/106201987