[Relationship Extraction] Explain entity relationship extraction in simple terms (introduction, common algorithms)

  • This article is included in the column " Explaining Natural Language Processing in Simple Ways ". This column focuses on the major classic algorithms in the field of natural language processing and will continue to be updated. Welcome to subscribe!
  • Personal homepage: Program Xingkong with dreams
  • Personal introduction: The editor is a master in the field of artificial intelligence and a full-stack engineer. He has been deeply involved in Flask back-end development, data mining, NLP, Android development, automation and other fields. He has rich experience in research and development of software systems and artificial intelligence algorithm services.
  • If the article is helpful to you, welcome 关注, 点赞, 收藏,订阅。

Background and Definition of Relation Extraction

        The concept of relation extraction (Relation Extraction, referred to as RE ) was proposed at the MUC conference in 1988. It is one of the basic tasks of information extraction. The purpose is to identify the target relationship in the text entity, which is an important technical link in the construction of knowledge graphs.

        A knowledge graph is a semantically associated entity that transforms people's cognition of the physical world into semantic information that computers can understand in a structured manner. Relation extraction extracts semantic relations between entities by identifying the relations between entities. In the real world, relation extraction is much more complicated than entity extraction, and natural sentences come in various forms, so relation extraction is much more difficult than entity extraction.

Figure 1 Example of relation extraction        

        Relation extraction is to extract unknown relationship facts from plain text and add them to the knowledge graph, which is the key to automatically construct large-scale knowledge graphs. Traditional methods rely heavily on feature engineering, and deep learning is changing knowledge graphs and text representation learning.

Detailed taxonomy of relation extraction 

        Relation extraction is to extract the relationship between entities from unstructured text. According to whether entities are labeled in the text, relation extraction methods can be divided into joint extraction and pipelined extraction .

        Joint extraction refers to the completion of entity recognition and relationship classification tasks from the text. Pipeline extraction refers to using the entity recognition model to identify the entity pairs in the text first, and then judge the relationship between the entity pairs. A complete pipelined relation extraction system includes: named entity recognition, entity linking and relation classification .

        Relation extraction models can be broadly classified into three categories: pattern-based methods, statistical machine learning, and neural networks . Among them, the neural network method works better. According to the form of extracted corpus, the relationship extraction model can be divided into sentence-level relationship extraction and paragraph-level relationship extraction. Sentence-level relation extraction means that two entities are in a sentence, and paragraph-level relation extraction means that two entities are not in the same sentence. Sentence-level relation extraction is more common in real business.

  Neural networks can automatically learn features from large-scale data. Research on such methods mostly focuses on designing model structures to capture text semantics. The best current relationship extraction models are supervised models, which require a large amount of labeled data and can only extract predefined relationships. This method cannot face complex real-world scenarios such as few samples. At present, there have been many works exploring the relationship extraction task in real-world scenarios.

The main task of relation extraction

Relation extraction is mainly divided into two tasks :

(1) Relationship classification

Classify and match entity pairs based on pre-specified relationships.

( 2) Open relation extraction

Extract structured textual relations directly from text, and map textual relations to canonical relations in knowledge bases.

Detailed Explanation of Classical Algorithms and Models of Relational Extraction

(1) Rule-based relation extraction

  • Extraction based on trigger word patterns

        The relationship of many entities can be extracted by hand mode, looking for triples (X, α, Y), where X, Y are entities, and α is the word between entities. For example, in the "Paris is in France" example, α = "is". This can be extracted using regular expressions.

        These are examples of using word sequence patterns because the rules specify a pattern that follows the order of the text. Unfortunately, these types of rules are inappropriate for longer range patterns and sequences with greater diversity. For example: "Fred and Mary got married" cannot be processed successfully with word sequence patterns.

        Instead, we can use dependency paths in sentences to know which word is grammatically dependent on another word. This can greatly increase the coverage of your rules without extra effort.

        We can also transform the sentences before applying the rules. For example: "The cake was baked by Harry" or "The cake which Harry baked" can be transformed into "Harry baked The cake". Then we change the order to use our "linear rule", while removing redundant modifiers in the middle.

  • Based on dependencies (syntax tree)

        Build rules starting from verbs, and limit the part of speech on nodes and the dependency relationship on edges.

        Advantages of rule-based relation extraction algorithms: Humans can create patterns with high accuracy, which can be customized for specific domains. Disadvantages: still low recall for human schemas (too many languages), requires a lot of manual work to create all possible rules, rules have to be created for each relation type.

(2) Supervised relationship extraction:

        The supervised neural network method refers to the use of deep learning methods to train models on large-scale supervised data sets. This type of method is currently the best and most researched.

        A common approach to supervised relation extraction is to train a stack of binary classifiers (or regular binary classifiers) to determine whether a specific relationship exists between two entities. These classifiers take as input relevant features of the text, thus requiring the text to be first annotated by other NLP models. Typical features are: context words, part-of-speech tagging, dependency paths between entities, NER tags, tokens, proximity between words, etc.

We can train and extract in the following ways:

(1) Manually annotate text data according to whether sentences are related or not related to a specific relation type. For example the "CEO" relationship: "Apple CEO Steve Jobs said to Bill Gates." is relevant, "Bob, Pie Enthusiast, said to Bill Gates." is not.

(2) If the relevant sentence expresses this relationship, manually label the positive/negative samples. "Apple CEO Steve Jobs said to Bill Gates.": (Steve Jobs, CEO, Apple) is a positive sample, and (Bill Gates, CEO, Apple) is a negative sample.

(3) Learn a binary classifier to determine whether a sentence is associated with a relation type.

(4) Learn a binary classifier on related sentences to determine whether the sentence expresses a relationship.

(5) Use a classifier to detect relationships in new text data.

        The supervised relationship extraction task does not have the subtask of entity recognition, because the data set has already marked what the subject entity and object entity are, so the fully supervised relationship extraction task is more like a classification task. The main structure of the model is feature extractor + relation classifier. Feature extractors such as CNN, LSTM, GNN, Transformer and BERT , etc.

Figure 2 Supervised relationship extraction method based on LSTM 

        Advantages of supervised relation extraction: high-quality supervision signal (ensures that the extracted relations are relevant), clear negative samples. Disadvantages: Labeling samples is expensive, adding new relationships is expensive and difficult (need to train a new classifier), does not generalize well for new fields, and is only available for a small number of related types.

(3) Remote supervision model:       

论文:《Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks

Link: https://aclanthology.org/D15-1203.pdf

Figure 3 PCNN model architecture

         The work of this thesis is to convert Fully Supervised to Distant Supervised. Distant supervised will generate a lot of noise or mislabeled data, directly using the supervised method for relationship classification, the effect is very poor. Most of the original methods are based on lexical and syntactic features, and cannot automatically extract features. Moreover, if the length of feature sentences such as syntax trees becomes longer, the accuracy rate will drop significantly. Therefore, the article uses the at least one assumption of Multi Instance Learning to solve the first problem; the method of Pooling is modified on the basis of Zeng 2014's CNN to solve the second problem.

         Training a powerful relationship extraction model requires more high-quality data, but building such a dataset requires a lot of manual annotation, which is time-consuming and laborious. Mike Mintz [22] was the first to use the distance supervision method to generate labeled data. The hypothesis of distance supervision is: if two entities have a relationship, then any sentence containing these two entities can express this relationship. For example, Ra{e1, e2} means that entities e1 and e2 have a relationship Ra. If there is a sentence that contains both e1 and e2, it is considered that the sentence expresses the relationship Ra, and the sentence is marked as a positive sample of the relationship Ra. Using this method only needs a knowledge base and a text base to automatically get labeled data.

        Remote supervision seems to be a perfect solution to solve the shortage of supervised data, but in fact there are the following problems in remote supervised data: 1) Not all sentences containing both e1 and e2 can express the relationship Ra, so there are a lot of labeling errors in the dataset ; 2) It is impossible to solve the situation where a pair of entities contains multiple relationships; 3) False negative problem, the instance marked as a negative sample actually has a relationship, but this knowledge does not exist in the knowledge graph, resulting in labeling errors. The second problem is that remote supervision cannot solve the problem, because there can only be one edge between two nodes in the knowledge graph, so it is impossible to model the situation where there are multiple relationships between a pair of entities. The third problem can be alleviated by better ways to generate negative samples, such as selecting sentences containing two entity pairs that obviously do not exist as negative samples. The first problem is the most serious, and a lot of research is currently focused on it.

        There are three ways to alleviate the noise of remote supervision data: 1) multi-instance learning, select the most effective instance from multiple instances as training samples; 2) use external information to select effective instances; 3) use complex models and training methods, such as soft label, enhanced learning, and confrontational learning.

(4) Joint relation extraction

        The process of extracting the spo triplet by the joint model of parameter sharing is divided into multiple steps (asynchronous). The loss of the entire model is the sum of the losses of each process. When calculating the gradient and updating the parameters in reverse, each part of the entire model will be updated at the same time. The parameters of the process, the training of the subsequent process can use the result of the previous process as a feature (note: there is no connection between the sub-processes of the pipeline model). Most current SOTA methods use this approach.

        The joint model of joint decoding is more in line with the idea of ​​"joint". It does not clearly divide the extraction process into two sub-processes: entity recognition and relationship classification. The spo triple is recognized in the same step, which truly realizes the Information sharing between tasks (deficiency: cannot identify overlapping entity relationships).

 Figure 4 Types of joint relation extraction models

  • Classical models using parameter sharing

论文:《End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

Link: https://aclanthology.org/P16-1105.pdf

      There are two BiLSTMs in the model, one is based on word sequence and is mainly used for entity detection ; the other is based on tree structures and is mainly used for relation extraction ; the latter is stacked on the former, and the output and hidden layer of the former are used as part of the input of the latter.

        This paper proposes a new end-to-end model to extract relations between entities. The model jointly models entities and relations using bidirectional sequential RNNs (left-to-right and right-to-left) and bidirectional tree-structured (bottom-up and top-down) LSTM-RNNs. Entities are detected first, then relations between detected entities are extracted using an incrementally decoded nn structure, and the nn parameters are jointly updated using entity and relation labels. Different from the traditional end-to-end extraction model, the model also contains two enhancements during the training process: entity pre-training (pre-training entity model) and plan sampling, which replaces (unreliable) prediction labels with gold labels within a certain probability. These enhancements alleviate the low performance of early entity detection.

        The model mainly consists of three representation layers: word embedding layer (embedding layer), word sequence-based LSTM-RNN layer (sequence layer) and dependent subtree-based LSTM-RNN layer (dependency layer) . During decoding, a greedy idea-based left-to-right entity detection is established on the sequence layer. On the dependency layer, dependency embedding and the minimum path of entity pairs in TreeLSTM are used to assist relationship classification. The dependency layer is stacked on the sequence layer. Such shared parameters are determined by entity labels and relationship labels.

(2) Classic model using joint decoding

论文:《Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Link: https://aclanthology.org/P17-1113.pdf

        Transform entity recognition and relationship classification into sequence labeling problems, adopt an end-to-end model, encode sentences through encoders, and directly obtain spo triples after inputting hidden layer vectors into decoders, without dividing the extraction process into entities Recognition and relation classification are two sub-processes.

        This paper transforms the entity-relationship joint extraction into a new labeling mode, which does not need to process entities and relations step by step as in previous studies, and directly model triples. The new label mode also takes into account the directionality of relationships. For the new label mode, a new loss bias function is designed. This provides us with a new way of thinking, that is, complex models often do not necessarily have better results, especially for industries and applications, and the cost is unpredictable. However, the ingenuity in task conversion can make the model lightweight and achieve good results.

(3) Pre-training model + relationship classification

        Input layer BERT: Use special symbols $ and # to identify the boundaries and positions of two entities;

        The features of the two parts after BERT feature extraction are used: the embedding of the BERT [CLS] position and the embedding corresponding to the two entities;

        Combine the above three types of features, and then connect a classification of FC and softmax layer output relations.

(4) Pre-training model + joint extraction

        Use a model to get the entities in the input text and the relationship between entities, including entity extraction module, relationship classification module and shared feature extraction module.

        The relationship classification module includes: BERT encodes the input sequence to obtain a feature sequence; the output of the NER module, through the argmax function, obtains a sequence with the same length as the input sequence and converts it into a fixed-dimensional sequence; the spliced ​​vectors pass through an FFN layer and a Biaffine A classifier that predicts relationships between entities.

 Follow the WeChat public account [Program Starry Sky with Dreams] to learn about cutting-edge knowledge in the field of software systems and artificial intelligence algorithms. Let us learn and make progress together!

Guess you like

Origin blog.csdn.net/kevinjin2011/article/details/124845896