PL-Marker (ACL 2022) - Information Extraction (NER+RE) New SOTA, Paper Analysis and Code Browsing

Motivation: Previous work on entity and relation extraction has focused on how to obtain better span representations from pre-trained encoders. But it ignores the interrelationship between spans (pairs).
Contribution: A new span representation method called Packed Levitated Markers (PL-Marker) is proposed.

  • Consider the interrelationships between spans (pairs) by strategically packing the markers in the encoder.
  • A neighborhood-oriented packing strategy is proposed , which comprehensively considers the neighborhood span to better model entity boundary information.
  • For those more complex span pair classification tasks, the authors design a subject-oriented packing strategy that packs each subject and all its objects to simulate the interrelationship between span pairs of the same subject.

《Packed Levitated Marker for Entity and Relation Extraction 》——ACL 2022(code

Preface: Introduction to Related Work

  • The current relationship extraction algorithm is classified according to the processing method:
    • pipeline approach: First extract the entity, then judge the relationship
      • eg: "A Frustratingly Easy Approach for Joint Entity and Relation Extraction" by Chen Danqi (hereinafter referred to as PURE)
    • Joint Entity and Realtion Extraction: The joint model, by transforming the joint task into the following problem:
      • Form filling questions, such as: table sequence, TPlinker
      • Sequence labeling problems, such as: ETL Span, PRGC
      • seq2seq problems, such as: SPN4RE

Currently, there are three main methods for span representation extraction: T-Concat, Solid Marker, and Levitated Marker.

  1. T-Concat
    T-Concat splices the embeddings of span boundary tokens (start and end) to represent span embeddings. This method stays at the token level to obtain relevant information and ignores the relationship between boundary tokens. This is the method used by the NER model in "A Frustratingly Easy Approach for Joint Entity and Relation Extraction" (hereinafter referred to as PURE) by Mr. Chen Danqi.

  2. Solid Marker (fixed mark)
    This method explicitly inserts a fixed mark solid marker at the beginning and end of the span to highlight the span in the input text, and for the object-subject span pair, it will be before and after the subject span and object span Insert a pair of solid markers respectively. This method is difficult to deal with the case of multiple object-subject pairs, because it cannot distinguish between different object and subject in the same sentence, nor can it handle the case of overlapping spans. This is the way PURE-Full uses.

  3. Levitated Marker (suspension mark)
    first sets a pair of levitated markers so that they share the same position information as the border token of the span, and then binds a pair of Levitated Markers together through a directed attention mechanism. Specifically, two markers in the Levitated Marker will be set to be visible to each other in the attention mask matrix, but invisible to the text token and other markers of the Levitated Marker.
    The RE model in PURE uses this method. She simply replaces solid markers with levitated markers with entity type information. At present, this method is only used in PURE-Approx, and the effect is discounted compared to PURE-Full. The paper believes that the method of simply placing the floating mark behind the sentence does not consider the relationship between multiple spans;

Here we will focus on PURE, PL-Marker is a kind of pipeline method, and it is also innovative on the basis of it. Here you can refer to the relevant analysis of the paper to view this blog. The explanation is relatively clear .
PURE

  • Ideas:
  1. Extract entities first: Send the text to PLM to obtain the contextual representation of each token, then stitch together the start token of each span , the contextual representation of end token and the embedding of the span length to obtain the span representation, and then send it to two layers Feedforward neural network, and finally predict entity type;
  2. Re-judgment relationship: The method of inserting " typed markers " before and after the subject span and object span in the sentence is adopted . This typed marker is a pair of markers. The typed markers of the object span are start marker [O: entity_type] and end marker [ /O: entity_type], [S: entity_type] and [/S: entity_type] of the Subject span. After the sentence is transformed in this way, the transformed sentence is sent to the PLM, and then the context representation of the Subject span and the start marker of the Object are spliced ​​together as the representation of the span, and then the relation type of the span is predicted by linear transformation and softmax . (hereinafter referred to as PURE-Full)
  • There is a problem: for each training and prediction sample, since only one pair of subject span and object span can be inserted, the calculation is very heavy
  • PURE specific analysis can be seen https://github.com/km1994/nlp_paper_study_information_extraction/tree/main/information_extraction/ERE_study/PURE

Thesis idea

  • PL-Marker proposes some optimization methods
    • Optimization method one:
      • Motivation: Calculation problem in PURE
      • Ideas:
        • First, put all typed markers at the end of the sentence;
        • Then, by letting the typed marker share the position embedding with the start token and end token in the corresponding subject span or object span , and use the attention mask matrix to complete some restrictions;
        • The typed marker can only be attentioned with the text and the typed marker in the current span pair, and the text can only be attentioned with the text
      • Advantages: Multiple pairs of span pairs can be processed in parallel in one sentence. The acceleration method they proposed improves the speed of relationship extraction;
      • Disadvantages: the effect is slightly reduced
    • Optimization method two:
      • Motivation: Performance degradation caused by optimization method 1
      • In previous work, there are three span representations ( see the preface for details ):
      • Paper span representation method:
        • Combining Solid Marker (fixed mark) and Levitated Marker (suspended mark)

        • insert image description here

        • Two ways of packing the markers are proposed:

          • When doing NER, it was proposed: Neighborhood-oriented Packing for span, which is to stitch the suspended marks of adjacent spans into the same sample ;
          • When doing RE, I proposed: Subject-oriented Packing for span pair, which is to insert the subject span into the sentence with fixed marks, and the corresponding Object spans are spliced ​​at the end of the sentence with floating marks and placed in a sample ;

overall framework

Two label packing strategies of PL-Marker

The model used by PL-Marker is PLM, which is similar to PURE-Approx:

  • The start marker in the floating marker pair spliced ​​at the back of the sentence shares the position embedding with the start token of the corresponding span in the sentence, and the end marker shares the position embedding with the end token of the corresponding span;
  • Use directional attention to bind levitated markers so that the hover can see its partner mark and the text in front of it, but not other pairs of hover markers, and the text can only see the text

insert image description here

1. NER stage

The suspension mark used in this part levitated markersputs all possible entity span suspension mark pairs at the end of the sentence. But there is a problem in this way, because all possible spans in the sentence must be traversed, and the length of the sentence that PLM can handle is limited.

Therefore, the author proposes a Packing strategy. During Packing, in order to better distinguish the boundaries of spans (and more importantly, to distinguish the difference between spans beginning with the same word), similar spans will be put together. It is to put the same or similar beginnings in a sample.

For a sentence X ={ x 1, x 2,⋯, x N*} with a token number of N , the maximum span length is specified as L, and the specific steps are as follows:

  1. First, all pairs of hover tags (one start tag, one end tag) are sorted. First, sort the start tokens of the spans represented by each pair of floating markers in ascending order, and then sort them in ascending order according to the position of the end tokens to obtain a list of candidate spans after sorting.
  2. Then, all the floating markers are split into K groups and spliced, so that the suspended markers of adjacent spans are divided into the same group (that is, spliced ​​together), and the K suspended marker sequences after splicing are spliced ​​separately After the sentence tokens sequence, generate K training examples (as shown in the above figure, [ O ] , [ / O ] represent levitated markers). This is the neighbor-oriented packing strategy ( neighborhood-oriented packing strategy). In fact, it is exhaustive to traverse all spans
  3. Finally, send the training instance into the PLM (such as Bert), and for each pair of floating markers s_i=(a,b), respectively put their start markers ( ) representation h_a^{(s)} and start token markerend markers ( end token marker) The representation h_b^{(e)} is spliced ​​together as the representation of its corresponding span: ϕ ( si ) = [ ha ( s ) ; hb ( e ) ] \phi (s_{i})=[h_a^{( s)} ;h_b^{(e)}]ϕ ( si)=[ha(s);hb(e)]
  4. When performing NER, the span representation obtained in the above steps (that is, the span feature extracted by PL-Marker) and the span representation extracted by the T-Concat method are combined to predict the entity type. What kind of merger? Look at the code!

2. RE stage

solid markersAs mentioned above, this article adopts a mixed levitated markersmethod in the RE stage , using solid markersmarkers subject spanand levitated markersmarker candidates object.

Suppose the input sequence is X, subject spans_i=(a,b), and its candidates object spans: ( c 1, d 1),( c 2, d 2),⋯,( c m, d m). The specific method is as follows:

  • subject spanInsert the beginning and end of the sentence solid marker([ S ] and [/S]) respectively, and then splice the corresponding candidates object spanbehind the text in the form of floating marks ([ O ] and [ / O ]) (as shown in the figure above). The sentence X = { x 1 , ⋯ , xn } is transformed into (the symbol ∪ means sharing position embedding):
    insert image description here

  • Send the training instance to PLM, for each span pair in the sample span pair ={s_{subject}, s_{object}}={(a,b),(c,d)}, the subject span before and solid markerafter The representations of h_{ a-1 } and h_{ b+1 } and a pair of floating markers h_c^{(s)} and h_d^{(e)} are spliced ​​together as the representation of the pair: ϕ ( si , sj ) = [ ha − 1 ; hb + 1 ; hc ( s ) ; hd ( e ) ] \phi (s_{i},s_{j})=[h_{a-1};h_{b+ 1};h_c^{(s)} ;h_d^{(e)}] object spanlevitated markersspan pairϕ ( si,sj)=[ha1;hb+1;hc(s);hd(e)]

  • In order to model the relationship between entity types and relation types, they also added an auxiliary loss function to predict object types.

  • In order to add some supplementary information, they added the prediction of the reverse relationship from object to subject, thus realizing the prediction of bidirectional relationship. In fact, they implemented an Object-oriented packing strategy (a bidirectional Predicted Inverse Relation). The Inverse Relation model without it caused a performance drop of 0.9%-1.1%. objectThis shows the importance of modeling and subjectbetween information in an asymmetric framework .

Train

1.1 ACEDatasetNER

Switch different labellists according to different data sets, and self.max_entity_length = args.max_pair_length * 2.
def is_punctuation: whether it is a punctuation mark
def get_original_token: convert special symbols into brackets
def tokenize_word: divided into RobertaTokenizer (header is ', length > 1, not punctuation) and other Tokenizers
def initialize : ner_label_map->for: data = json.loads(line), subword2token(index*len[li]), token2subword(len[li]++) is to see whether the previous word segmentation is thorough. Slightly behind

1.2 for _ in train_iterator:

t_total = len(train_dataloader) // args.gradient_accumulation_steps * args.num_train_epochs
num_warmup_steps=int(0.1*t_total)

for _ in train_iterator: for step, batch in enumerate(epoch_iterator):
note:if ‘span’ in agrgs.model_type: different dataset preprocess and inputs['mention_pos'] = batch[4]
After outputs = model(**inputs)
loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc)

2 BertForSpanMarkerNER (the code is called by default)

self.ner_classifier = nn.Linear(config.hidden_size*4, self.num_labels)
self.alpha = torch.tensor([config.alpha] + [1.0] * (self.num_labels-1), dtype=torch.float32)

BertModel: outputs=# sequence_output, pooled_output, (hidden_states), (attentions)#The latter two, if there are, add them; there is no

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        mentions=None,
        token_type_ids=None,
        position_ids=None,
        head_mask=None,
        inputs_embeds=None,
        labels=None,
        mention_pos=None,
        full_attention_mask=None,
    ):
        
        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            full_attention_mask=full_attention_mask,
        )
        hidden_states = outputs[0] #=sequence_output
        if self.onedropout:
            hidden_states = self.dropout(hidden_states)

        seq_len = self.max_seq_length
        bsz, tot_seq_len = input_ids.shape
        ent_len = (tot_seq_len-seq_len) // 2 #(1024-512)//2

        e1_hidden_states = hidden_states[:, seq_len:seq_len+ent_len]#+m1=PL-Marker抽取到的span特征h_a^(s)
        e2_hidden_states = hidden_states[:, seq_len+ent_len: ] #h_b^(e)


        m1_start_states = hidden_states[torch.arange(bsz).unsqueeze(-1), mention_pos[:, :, 0]] #span的start token的上下文表征
        m1_end_states = hidden_states[torch.arange(bsz).unsqueeze(-1), mention_pos[:, :, 1]] #end token,len=ent_len,z这两者是T-concat方法抽取的span表征

        feature_vector = torch.cat([e1_hidden_states, e2_hidden_states, m1_start_states, m1_end_states], dim=2) #拼接方式与BiNER不同
        if not self.onedropout:
            feature_vector = self.dropout(feature_vector)
        ner_prediction_scores = self.ner_classifier(feature_vector)


        outputs = (ner_prediction_scores, ) + outputs[2:]  # Add hidden states and attention if they are here

        if labels is not None:
            loss_fct_ner = CrossEntropyLoss(ignore_index=-1,  weight=self.alpha.to(ner_prediction_scores))
            ner_loss = loss_fct_ner(ner_prediction_scores.view(-1, self.num_labels), labels.view(-1))
            outputs = (ner_loss, ) + outputs

        return outputs #  (ner_loss, ner_prediction_scores) + (hidden states , attention)

** COMPARISON WITH PURE**

insert image description here
The representation of span he ( si ), ϕ(si) represents the embedding of the learned span width feature [the learning method and the splicing method of span representation he(si) are as follows]

def batchify(samples, batch_size):
    """
    Batchfy samples with a batch size
    """
    num_samples = len(samples)

    list_samples_batches = []
    
    # if a sentence is too long, make itself a batch to avoid GPU OOM
    to_single_batch = []
    for i in range(0, len(samples)):
        if len(samples[i]['tokens']) > 350:
            to_single_batch.append(i)
    
    for i in to_single_batch:
        logger.info('Single batch sample: %s-%d', samples[i]['doc_key'], samples[i]['sentence_ix'])
        list_samples_batches.append([samples[i]])
    samples = [sample for i, sample in enumerate(samples) if i not in to_single_batch]

    for i in range(0, len(samples), batch_size):
        list_samples_batches.append(samples[i:i+batch_size])

    assert(sum([len(batch) for batch in list_samples_batches]) == num_samples)

    return list_samples_batches

spans = list_samples_batches[“spans”]

def _get_input_tensors(self, tokens, spans, spans_ner_label):
        start2idx = []
        end2idx = []
        
        bert_tokens = []
        bert_tokens.append(self.tokenizer.cls_token)
        for token in tokens: #将start2end tokenizer
            start2idx.append(len(bert_tokens))
            sub_tokens = self.tokenizer.tokenize(token)
            bert_tokens += sub_tokens
            end2idx.append(len(bert_tokens)-1)
        bert_tokens.append(self.tokenizer.sep_token)

        indexed_tokens = self.tokenizer.convert_tokens_to_ids(bert_tokens)
        tokens_tensor = torch.tensor([indexed_tokens])

        bert_spans = [[start2idx[span[0]], end2idx[span[1]], span[2]] for span in spans]
        bert_spans_tensor = torch.tensor([bert_spans])

        spans_ner_label_tensor = torch.tensor([spans_ner_label])

        return tokens_tensor, bert_spans_tensor, spans_ner_label_tensor

spans = bert_spans_tensor

def _get_span_embeddings(self, input_ids, spans, token_type_ids=None, attention_mask=None):
        sequence_output, pooled_output = self.albert(input_ids=input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)
        
        sequence_output = self.hidden_dropout(sequence_output)

        """
        spans: [batch_size, num_spans, 3]; 0: left_ned, 1: right_end, 2: width
        spans_mask: (batch_size, num_spans, )
        """
        spans_start = spans[:, :, 0].view(spans.size(0), -1)
        spans_start_embedding = batched_index_select(sequence_output, spans_start)
        spans_end = spans[:, :, 1].view(spans.size(0), -1)
        spans_end_embedding = batched_index_select(sequence_output, spans_end)

        spans_width = spans[:, :, 2].view(spans.size(0), -1)
        spans_width_embedding = self.width_embedding(spans_width)

        spans_embedding = torch.cat((spans_start_embedding, spans_end_embedding, spans_width_embedding), dim=-1)
        """
        spans_embedding: (batch_size, num_spans, hidden_size*2+embedding_dim)
        """
        return spans_embedding

Because when the original sentence is represented, the context information we get after adding the marker is different, because the marker changes the sentence structure. In order to be able to reuse the information representation of words, we adopt the following strategy: we bind the location information of the marker to the location information at the beginning and end of the span. In this way, the position information embedding of the original sentence will not be changed. Then, we constrain the attention layer. We force the text token to only pay attention to the text token and not to the marker token, and all these 4 tokens correspond to the same span pair. This change allows us to reuse tokens. In practice, we add all markers to the end of sentences.
————————————————
Copyright statement: This article is an original article of CSDN blogger "alkaid_sjtu", following the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement for reprinting .
Original link: https://blog.csdn.net/weixin_44047857/article/details/122074084

3 BertForSpanMarkerBiNER (according to the paper formula implementation)

# ...表示省略不写,后同
    def forward(
        ...
    ):
        outputs = self.bert(
            ...
        )
        hidden_states = outputs[0]
        if self.onedropout:
            hidden_states = self.dropout(hidden_states)

        seq_len = self.max_seq_length
        bsz, tot_seq_len = input_ids.shape
        ent_len = (tot_seq_len-seq_len) // 2

        e1_hidden_states = hidden_states[:, seq_len:seq_len+ent_len]
        e2_hidden_states = hidden_states[:, seq_len+ent_len: ]


        m1_start_states = hidden_states[torch.arange(bsz).unsqueeze(-1), mention_pos[:, :, 0]]
        m1_end_states = hidden_states[torch.arange(bsz).unsqueeze(-1), mention_pos[:, :, 1]]

        m1 = torch.cat([e1_hidden_states, m1_start_states], dim=2) #h_a^s
        m2 = torch.cat([e2_hidden_states, m1_end_states], dim=2) #h_b^e
        
        feature_vector = torch.cat([m1, m2], dim=2)
        if not self.onedropout:
            feature_vector = self.dropout(feature_vector)
        ner_prediction_scores = self.ner_classifier(feature_vector)

        # m1 = self.dropout(self.reduce_dim(m1))
        # m2 = self.dropout(self.reduce_dim(m2))

        m1 = F.gelu(self.reduce_dim(m1))
        m2 = F.gelu(self.reduce_dim(m2))


        ner_prediction_scores_bilinear = self.blinear(m1, m2)

        ner_prediction_scores = ner_prediction_scores + ner_prediction_scores_bilinear

        outputs = (ner_prediction_scores, ) + outputs[2:]  # Add hidden states and attention if they are here

        if labels is not None:
            loss_fct_ner = CrossEntropyLoss(ignore_index=-1,  weight=self.alpha.to(ner_prediction_scores))
            ner_loss = loss_fct_ner(ner_prediction_scores.view(-1, self.num_labels), labels.view(-1))
            outputs = (ner_loss, ) + outputs

        return outputs

4 ACEDataset (RE module)

Compared with 1, more

        self.max_pair_length = max_pair_length #
        self.max_entity_length = self.max_pair_length*2 #
        self.use_typemarker = args.use_typemarker #
        self.no_sym = args.no_sym #label_list 多两个;self.sym_labels = ['NIL'];#无'PER-SOC'

5 BertForACEBothOneDropoutSub (code default Re call)

           inputs = {'input_ids':      batch[0],
                     'attention_mask': batch[1],
                     'position_ids':   batch[2],
                     'labels':         batch[5],
                     'ner_labels':     batch[6],
                     }
           inputs['sub_positions'] = batch[3]
           inputs['mention_pos'] = batch[4]
           inputs['sub_ner_labels'] = batch[7]
def forward(
    self,
    input_ids=None,
    attention_mask=None,
    mentions=None,
    token_type_ids=None,
    position_ids=None,
    head_mask=None,
    inputs_embeds=None,
    sub_positions=None,
    labels=None,
    ner_labels=None,
):
    
    outputs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
    )
    hidden_states = outputs[0]
    hidden_states = self.dropout(hidden_states)
    seq_len = self.max_seq_length
    bsz, tot_seq_len = input_ids.shape
    ent_len = (tot_seq_len-seq_len) // 2

    e1_hidden_states = hidden_states[:, seq_len:seq_len+ent_len]
    e2_hidden_states = hidden_states[:, seq_len+ent_len: ]

    feature_vector = torch.cat([e1_hidden_states, e2_hidden_states], dim=2)

    ner_prediction_scores = self.ner_classifier(feature_vector)


    m1_start_states = hidden_states[torch.arange(bsz), sub_positions[:, 0]]
    m1_end_states = hidden_states[torch.arange(bsz), sub_positions[:, 1]]
    m1_states = torch.cat([m1_start_states, m1_end_states], dim=-1)

    m1_scores = self.re_classifier_m1(m1_states)  # bsz, num_label
    m2_scores = self.re_classifier_m2(feature_vector) # bsz, ent_len, num_label
    re_prediction_scores = m1_scores.unsqueeze(1) + m2_scores

    outputs = (re_prediction_scores, ner_prediction_scores) + outputs[2:]  # Add hidden states and attention if they are here

    if labels is not None:
        loss_fct_re = CrossEntropyLoss(ignore_index=-1,  weight=self.alpha.to(re_prediction_scores))
        loss_fct_ner = CrossEntropyLoss(ignore_index=-1)
        re_loss = loss_fct_re(re_prediction_scores.view(-1, self.num_labels), labels.view(-1))
        ner_loss = loss_fct_ner(ner_prediction_scores.view(-1, self.num_ner_labels), ner_labels.view(-1))

        loss = re_loss + ner_loss
        outputs = (loss, re_loss, ner_loss) + outputs

    return outputs  # (masked_lm_loss), prediction_scores, (hidden_states), (attentions)

6 BertForACEBothOneDropoutLeviPair (in line with the paper Re formula)

# ...表示省略不写,后同
    def forward(
        ...
    ):
        outputs = self.bert(
            ...
        )
        hidden_states = outputs[0]
        hidden_states = self.dropout(hidden_states)
        seq_len = self.max_seq_length
        bsz, tot_seq_len = input_ids.shape
        ent_len = (tot_seq_len-seq_len) // 4

        e1_hidden_states = hidden_states[:, seq_len:seq_len+ent_len] #h_a-1
        e2_hidden_states = hidden_states[:, seq_len+ent_len*1: seq_len+ent_len*2] #h_b+1
        e3_hidden_states = hidden_states[:, seq_len+ent_len*2: seq_len+ent_len*3] #h_c^s
        e4_hidden_states = hidden_states[:, seq_len+ent_len*3: seq_len+ent_len*4] #h_d_e

        m1_feature_vector = torch.cat([e1_hidden_states, e2_hidden_states], dim=2)
        m2_feature_vector = torch.cat([e3_hidden_states, e4_hidden_states], dim=2)
        feature_vector = torch.cat([m1_feature_vector, m2_feature_vector], dim=2) #/phi

        m1_ner_prediction_scores = self.ner_classifier(m1_feature_vector)
        m2_ner_prediction_scores = self.ner_classifier(m2_feature_vector)


        re_prediction_scores = self.re_classifier(feature_vector) # bsz, ent_len, num_label

        outputs = (re_prediction_scores, m1_ner_prediction_scores, m2_ner_prediction_scores) + outputs[2:]  # Add hidden states and attention if they are here

        if labels is not None:
            loss_fct_re = CrossEntropyLoss(ignore_index=-1,  weight=self.alpha.to(re_prediction_scores))
            loss_fct_ner = CrossEntropyLoss(ignore_index=-1)
            re_loss = loss_fct_re(re_prediction_scores.view(-1, self.num_labels), labels.view(-1))
            m1_ner_loss = loss_fct_ner(m1_ner_prediction_scores.view(-1, self.num_ner_labels), m1_ner_labels.view(-1))
            m2_ner_loss = loss_fct_ner(m2_ner_prediction_scores.view(-1, self.num_ner_labels), m2_ner_labels.view(-1))

            loss = re_loss + m1_ner_loss + m2_ner_loss
            outputs = (loss, re_loss, m1_ner_loss+m2_ner_loss) + outputs

        return outputs  # (masked_lm_loss), prediction_scores, (hidden_states), (attentions)

As for Relation, PURE is to splice the output and embedding of subject and object [BertForRelation and BertForRelationApprox]

PL-marker vs PURE

In general, PL-Marker's improvements to PURE mainly include:

Aiming at PURE's levitated marker method, a packing strategy of packed levitated marker is proposed .
In the NER stage, PURE uses a standard span -based method (T-concat). This article also uses the packed levitated marker method for NER tasks.
In the RE stage, Inverse Relation is introduced to further improve the performance of RE. At the same time, the typed marker used by PURE is cancelled, and the entity type loss function is used instead

PS

  • This paper found through experiments that, compared to PURE, using typed markerhas a negative impact on performance. Personally, I think that from the perspective of following intuition, entity type information will definitely play a role in inspiring relationship classification (especially, for example, in the two candidate relationship types of "date of birth" and "nationality", if there are types It is two entities of "person" and "time", and the relationship between them is "date of birth", etc., not "nationality"). The reason for the experimental results in this paper may be that the use method and model structure are not applicable. Yes, this can be improved in the future.
  • NERThe stage reduces the training time by grouping, but the time cost is still higher than that of the previous SOTA, after all, it is enumerating all possibilities span. Besides grouping, is there any other way to reduce the calculation time ?

Guess you like

Origin blog.csdn.net/weixin_42455006/article/details/125716619