A taste of the paper | Knowledge graph completion and multi-hop reasoning in large-scale knowledge graphs

4f914156eaf72c2cf90d9290176f8520.png

Notes sorting: Liu Jianyu, master of Southeast University, research direction is knowledge map rule learning and reasoning

Link: https://dl.acm.org/doi/abs/10.1145/3534678.3539405

motivation

Knowledge graphs (KGs) capture knowledge in the form of head-relation-tail triples and are an important component in many artificial intelligence systems. There are two important reasoning tasks on KG: (1) single-hop knowledge graph completion, which involves predicting individual links in KG; (2) multi-hop reasoning, whose goal is to predict which KG entities satisfy a given logical query. Embedding-based methods address both tasks by first computing embeddings for each entity and relation, and then using them to form predictions. However, existing scalable KG embedding frameworks only support single-hop knowledge graph completion, which cannot be applied to more challenging multi-hop reasoning tasks.

There are two major challenges in embedding-based multi-hop inference for KG: (1) Algorithmically, given a huge KG (with hundreds of millions of entities), training by examples is no longer feasible. (2) In terms of system, the current single-hop large-scale KG embedding framework is based on graph partitioning, but multi-hop reasoning needs to traverse multiple relations in the graph, which usually spans multiple partitions, thus causing difficulties in reasoning.

This paper proposes Scalable MultihOp REasoning (SMORE) to implement single-hop and multi-hop reasoning on large-scale knowledge graphs, generate positive and negative samples online by instantiating KG, and propose a two-way rejection sampling method to effectively query to obtain high-quality negative samples. For Challenge 2, design an asynchronous scheduler to maximize the throughput of GPU computing through overlapping sampling, asynchronous embedding read/write, neural network feedforward and optimizer update, and accelerate inference on large-scale KG.

contribute

The main contributions of this paper are:

(1) A framework called SMORE is proposed, which is the first general framework to support single-hop and multi-hop reasoning in knowledge graphs.

(2) A novel bidirectional rejection sampling method is proposed, which achieves a square root reduction in the complexity of online training data generation.

(3) A distributed training, asynchronous update mechanism is designed to avoid heavy CPU/GPU memory read/write for pipeline processing of each stage in each stochastic gradient update.

method

1.  Training data sampling

Aiming at the problem that it is impossible to train all instances of large-scale knowledge graphs to obtain embedded representations, this paper proposes dynamic sampling and instantiated query to obtain training instances for comparative learning of multi-hop reasoning.

To generate training examples with a set of positive and negative entities, a query for a given knowledge graph is first instantiated from a set of query logic structures (see Figure 1). The root of the instantiated query represents a known positive (answer) entity, and then reverse-directed sampling uses depth-first search (DFS) on the logical structure of the query from the root (answer) to the leaves (anchor entities). During DFS, each node in the query structure corresponds to an entity on the knowledge graph, and the edges correspond to the relations of previously associated entities. After instantiating (node/edge ground) the query structure, positive samples are obtained.

76e192ed8b6eddabd4579bb01cc21557.png

Figure 1 Different query logic structures and optimal node cuts (shaded nodes) used for bidirectional rejection sampling

For the acquisition of negative samples, this paper uses two-way rejection sampling. Inspired by bidirectional search, Node cut is first obtained in the query calculation plan, that is, the node subset that cuts all paths between each leaf node and the root node. Then perform a two-way search: traverse from leaf (anchor) to Node cut, and cache the entities obtained during the traversal - forward caching. Negative entities are then sampled, traversed from root to Node cut, and verified to be true negative entities by checking the overlap of cached entities and traversed sets—reverse verification.

2.  Training strategy

SMORE combines the use of CPU and GPU, where the dense matrix computation is deployed on the GPU and the sampling operation is on the CPU.

For large KGs with more than millions of entities, the embedding representation cannot be stored in the GPU, and SMORE puts the embedding matrix on shared CPU memory, while placing other parameters, such as neural logic operators, in each individual GPU. The distributed training process is as follows:

(1) Collect a small batch of training samples from the sampler .

(2) Load related entity embeddings from CPU to GPU.

(3) Compute the gradient locally and execute the gradient θ. Update the local copy θ.

(4) Asynchronously update the shared θ with θ.

where θ is the embedding matrix and θ is a copy of the other parameters.

In order to avoid heavy CPU/GPU memory read/write, this paper proposes an asynchronous mechanism for pipeline processing of each stage in each stochastic gradient update, including four meta-threads:

(1) Multithreaded sampler: Each worker node maintains a sampler , which can access the shared KG. The sampler contains a thread pool that samples queries in parallel and fetches corresponding positive/negative answers.

(2) Sparse embedded read/write: For the embedded matrix θ, create a separate background thread and a CUDA stream for embedded read and write. When loading an embedding of some entity to the GPU, a background thread first loads it into a pinned memory region, and then the CUDA asynchronous stream will perform a pinned memory to GPU memory copy. This read operation is non-blocking and is only synchronous when requested by a CUDA operator in the main CUDA stream. Write operations are similar but in the opposite direction.

(3) Dense computing: When the training data is ready and the embedding of the anchor entity is obtained to the GPU, the model starts to feed forward. After obtaining the local gradients θ and θ, the update operation of θ will first be called asynchronously without blocking, while the AllReduce operation will start, followed by the intensive parameter update of θ on the GPU.

(4) Sparse optimizer with asynchronous reads and writes: Unlike θ, only a fraction of the rows of θ are involved in each random update. Therefore, only θ, θ, θ and their gradients, i.e. embeddings related to positive/negative entities and anchor entities, are tracked.

experiment

This paper evaluates SMORE on KG completion and multi-hop reasoning tasks on KG. The experimental task is given an incomplete KG, and the goal is to train query embedding methods to discover missing answers to complex logical queries. The data set uses FB15k, FB15k-237, NELL.

Regarding the improvement of algorithm efficiency, the results in Figure 2 show that the bidirectional sampler can reduce the square root of the computational cost compared with traversal, which verifies the acceleration of the proposed bidirectional sampler compared to the naive exhaustive traversal method.

e74d93862bd2088ed12b1d732aa375b2.png

Figure 2 Efficiency comparison between KG traversal and bidirectional sampling on different query structures

Since SMORE employs a query sampling scheme, negative samples can be shared for a batch of sampled queries, which significantly improves the end-to-end training efficiency of various query embeddings. As shown in Table 1, experiments show that SMORE is 119.4% faster on average and reduces GPU memory usage by 30.6%.

3dbf4cae39ec9f61c2acf5e8cdb74ceb.png

Table 1 Performance comparison between SMORE and KGReasoning on small KG

It is also found that training on multiple GPUs increases the training speed almost linearly with the number of GPUs, which indicates the effectiveness of the asynchronous training mechanism.

8e2243f951e677535a6c5d563dcd66ee.png

Table 2 Performance of each framework when running on Freebase KG

From the analysis of the results in Table 2, it can be seen that the runtime performance of SMORE in single-hop link prediction (KG completion) is compared with the state-of-the-art large-scale KG frameworks (including Marius, DGL-KE and PBG), and the performance of SMORE in the 1-GPU setting The time is significantly faster than PBG, but slightly slower than Marius. It also has better scalability than other systems, and runs significantly faster than DGL-KE and PBG in a multi-GPU setup, indicating the superiority of the system in terms of operating efficiency.

For the performance of inference prediction, the experimental results show that on FB15k, SMORE improves the MRR by 3.54% in the Q2B model, which proves the effectiveness of the system in improving KG inference. In the face of query answering on large KGs, baseline methods cannot scale to such large KGs due to insufficient GPU memory and computationally intensive exhaustive query sampling, while SMORE can easily Query embedding scales to these large KGs.

In addition, the experiment compares the performance of the Q2B model trained under different samplers. The results show that the performance of the bidirectional sampler (bidirectional) is better than the exhaustive sampler (exhaustive traversal), while the performance of random sampling is the worst, which proves that Effectiveness of a two-way sampling strategy.

Summarize

This paper studies the problem of knowledge graph completion and multi-hop reasoning. The authors propose a framework named SMORE, which is the first general framework supporting both single-hop and multi-hop reasoning in knowledge graphs. SMORE enables single-hop and multi-hop reasoning on large-scale knowledge graphs. Key to SMORE's runtime performance is a novel bidirectional rejection sampling that achieves a square root reduction in the complexity of online training data generation. In addition, SMORE reduces inference complexity by utilizing strategies such as asynchronous scheduling, overlapping CPU data sampling, and GPU embedded computing, and achieves comparable or even better runtime performance than state-of-the-art frameworks in both single-GPU and multi-GPU settings.


OpenKG

OpenKG (Chinese Open Knowledge Graph) aims to promote the openness, interconnection and crowdsourcing of knowledge graph data with Chinese as the core, and promote the open source and open source of knowledge graph algorithms, tools and platforms.

72887d5a5abbfde13117556f8dbbc277.png

Click to read the original text and enter the OpenKG website.

Guess you like

Origin blog.csdn.net/TgqDT3gGaMdkHasLZv/article/details/130895863