Path-Ranking: path generation, recall, rough and fine sorting in KBQA

Path generation : The entity in the question is obtained through the entity link, and the path generation starts from the entity, traverses the KG, generates all possible answer paths, and prunes the path during the process.

1. Path recall

The path recall strategy is divided into two cases: single entity and multi-entity . Among them, multi-entity is preferred , that is, if there is a path between multiple entities, no single-entity recall is performed.

1.1 Single entity path recall strategy:

(1) One-degree path as the head entity <entity><relationship><?x>;

(2) One-degree path <?x><relationship><entity> as the tail entity;

(3) Extend (1) to the second-degree outgoing path <entity><relation1><?x><?x><relation2><?y>;

(4) Extend (1) to the second-degree entry path <entity><relation1><?x><?y><relation2><?x>;

(5) Extend (2) to the second-degree outgoing path <?x><relationship 1><entity><?x><relationship 2><?y>;

(6) Extend (2) to the second-degree entry path <?x><relation1><entity><?y><relation2><?x>.

1.2 Multi-entity path recall strategy (taking two entities as an example):

(1) One-degree path <entity 1><relationship 1><?x><entity 2><relationship 2><?x>;

(2) One-degree path <?x><relationship 1><entity 1><?x><relationship 2><entity 2>;

(3) One-degree path <?x><relationship 1><entity 1><entity 2><relationship 2><?x>;

(4) One-degree path <entity 1><relationship 1><?x><?x><relationship 2><entity 2>;

(5) For (1) (same as 2, 3, 4) extended to the second-degree outgoing path <entity><relationship 1><?x><entity 2><relationship 2><?x><?x><relationship 3><?y>;

(6) For (1) (same as 2, 3, 4) expand to the second-degree entry path <entity><relationship 1><?x><entity 2><relationship 2><?x><?y><relationship 3><?x>;

(7) Relationship between entities <entity 1><?x><entity 2>.

2. Pruning

In order to avoid the explosion of the number of candidate answers, we perform pruning according to the following strategy.

(1) Delete the path where the answer entity is the topic entity in the one-hop path, and avoid using the topic entity itself as the answer;

(2) If there are more than 10,000 second-hop nodes, no second-hop is performed;

(3) Delete the path where the answer entity is the topic entity in the second-hop path, and avoid using the topic entity itself as the answer;

(4) When the number of two-hop paths (out or in) exceeds 100 but less than 500, delete the candidate answer paths that have no character intersection between the relationship in the two-hop path (out or in) and the question sentence;

(5) When the number of two-hop paths (outgoing or incoming) exceeds 500, filter out all two-hop paths.

3. Path sorting

Path sorting is divided into two steps: rough sorting and fine sorting.

3.1 Rough sorting

According to the characteristics of the question query and candidate paths, the candidate paths can be roughly sorted. Machine learning models can be used, such as: LightGBM, XGBOOST, etc., and the top20 paths can be reserved. The following characteristics can be referred to when performing feature engineering:

  1. character features
    • Number of character overlaps
    • number of word coincidences
    • Jaccard similarity of characters (similar to IOU)
    • Jaccard similarity of words (similar to IOU)
    • edit distance
    • Whether all the characters of the path are in the query
  2. Path's own characteristics
    • number of answers
    • The number of hops in the path
    • the number of entities in the path
    • the number of relationships in the path
    • the length of the path
  3. semantic features
    • Similarity of word-level vectors
    • Similarity of word-level vectors (you can use jieba to cut words)
    • Bi-gram level vector similarity (available ac automata, etc.)
  4. Popularity Characteristics
    • How often the answer occurs in KG
    • The number of first-degree relationships with different answers
  5. digital features
    • number of coincidences
    • Jaccard similarity between query and path
    • Whether all the numbers in the path are in the query
  6. other features
    • Whether the candidate answer is in the query
    • Whether the relationship in the path is in the query
    • Whether the intent in the path is in the query

3.2 Finishing

The main purpose of using lgb and other machine learning models in 3.1 is to reduce the amount of data in the process of fine sorting through rough sorting. During fine sorting, the pre-trained language model can be used to calculate the semantic matching degree of query and path, and the answer with the highest score can be selected . path as the answer.

path-ranking

Official Account: Natural Language Processing and Deep Learning

Guess you like

Origin blog.csdn.net/yjh_SE007/article/details/127068875