Source: AINLPer WeChat public account ( daily paper dry goods sharing!! )
Editor: ShuYini
Proofreading: ShuYini
Time: 2022-09-29
introduction
Knowledge Graph (Knowledge Graph) stores data in a graph database in the form of triples. The core of realizing knowledge graph question answering is to convert natural language into query language for graph data, that is, the process of mapping natural language questions to structured queries. When some links are missing in KGs, it is difficult to recognize the correct answer, if you solve this problem, this article may be able to help you.The paper and source code are linked at the back
background introduction
Multi-Hop reasoning based on Knowledge Graphs (KGs), which aims to find answer entities for a given query using knowledge from KGs, has received extensive attention from academia and industry in recent years . In general, it involves answering first-order logic (FOL) queries on KGs using operators including existential quantification (∃), conjunction (∧), disjunction (∨), and negation (¬). Multi- A common method for Hop reasoning is: first convert a FOL query into a corresponding computation graph (where each node represents a set of entities and each edge represents a logic operation), and then traverse the KG according to the computation graph to identify the answer set. However, this approach faces two major challenges. First, it is difficult to identify the correct answer when some links are missing in KGs. Second, it needs to process all intermediate entities on the inference path, which may lead to exponential computational cost .
To address these issues, researchers have paid more and more attention to query embedding (QE) techniques, which embed entity and FOL queries into a low-dimensional space . The QE model associates each logical operator in the computation graph with a logical operator in the embedding space. Given a query, the QE model generates query embeddings based on the corresponding computation graph. They then determine whether an entity is the correct answer based on the similarity between the query embedding and the entity embedding.
Among existing QE models, embedding entities and queries into geometric shapes based on geometric models has shown promising performance . Geometry-based models typically represent entity sets as "regions" (eg, points and boxes) in Euclidean space, and then perform set operations on them. For example, Query2Box represents entities as points and queries as boxes. If a point is inside a box, then the corresponding entity is the answer to the query. Compared with non-geometric methods, geometric shapes provide a natural and easy-to-interpret way of representing sets and their logical relationships.
However, it is difficult for existing geometry-based models to model queries with negation, which greatly limits their applicability . For example, GQE and Query2Box - don't embed queries into dots and boxes - cannot handle queries with negation, because the complement of dot/box is no longer dot/box. To address this issue, Ren & Leskovec proposed a probabilistic quantitative easing model using the Beta distribution. However, it does not have some of the advantages of geometric models. For example, when using the Beta distribution, it is not clear how to determine whether an entity is the answer to a query in a box case. Therefore, it remains challenging to propose a geometric QE model capable of simulating all FOL queries.
Cone model introduction
In this paper, we propose a new query embedding model, Cone Embeddings, to answer Mutil-Hop first-order logic (FOL) queries on knowledge graphs. We represent entity sets as Cartesian products of cones and design corresponding logical operations .
Cone Embeddings
Given a query, we represent the reasoning process as a computational graph (above a), where nodes represent entity sets and edges represent logical operations on entity sets. Figure b above gives several examples of (sector) cones, and one may notice some similarities between sector cones and boxes defined in Query2Box, which also involve region representation. However, fanned cones are more expressive than square boxes in comparison.
Cone Logical operation
Projection Operator P : As shown in Figure a above, the goal of P is to represent an entity's adjacent entities, which are connected by a given relationship. It maps one entity set to another.
Intersection Operator L : As shown in Figure b above, given a request q, its connection request entity set is [ q 1 ], [ q 2 ] [q_1], [q_2][q1]、[q2] , then the purpose of operating L is: $[q]= {\textstyle \bigcap_{i=1}^{n}} [q_i] $
Union Operator U : As shown in Figure c above, given a request q, and its connection query request entity set [ q 1 ], [ q 2 ], [ q 3 ] [q_1], [q_2], [q_3][q1]、[q2]、[q3] are separated, then the purpose of operation U is: to unify these scattered sets together, namely:[ q ] = ⋃ i = 1 n [ qi ] [q]= {\textstyle \bigcup_{i=1}^{ n}} [q_i][q]=⋃i=1n[qi]
Complement Operator C : As shown in Figure d above, given a connection query request q and its corresponding entity set [q], the purpose of operation C is to be able to identify [¬q] as the complement of the request q.
Distance Function : Defines a distance function for join queries. Inspired by Ren et al., we divide the distance d into two parts—the external distance do d_odoand internal distance di d_idi. Figure e above gives an illustration of the distance function d.
experiment snapshot
In this section, we demonstrate experimentally that: 1) ConE is a powerful knowledge graph Mutil-hop reasoning model; 2) ConE's cone embedding can effectively model the cardinality (i.e., the number of elements) of an answer set.
1. The following table shows the experimental results on queries without negation, that is, there is a positive first-order (EPFO) query, where AVG represents the average performance . Overall, ConE significantly outperforms the comparison models.
2. The following table shows the results of ConE and BETAE when modeling FOL queries with negation . Since GQE and Q2B are not capable of handling negation operators, we do not incorporate their results into the experiments. Overall, ConE performed significantly better than BETAE
recommended reading
[1] Must see! ! [AINLPer] Natural Language Processing (NLP) Domain Knowledge && Data Sharing
[12] Understand linear regression in one article [more detailed] (source code included)
[13] An article to understand logistic regression [more detailed] (including source code)
Paper && source code
Title: ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs
Author: Key Laboratory of Technology, Global Academy of Sciences, Chinese Academy of Sciences
Paper: https://arxiv.org/pdf/2110.13715.pdf
Code: https://github.com/MIRALab -USTC/QE-ConE.
last not last
Follow the AINLPer WeChat public account ( the latest papers are recommended to you every day!! )