Boundary Smoothing for NER

1. Summary

Named entity recognition (NER) models are prone to over-confidence problems, which degrade performance. Inspired by label smoothing, the author uses the ambiguity of boundary labels in NER as motivation, and proposes a boundary smoothing method as a regularization technique for span-based NER models. In addition to the annotated spans, it redistributes the probabilities of entities to spans around them.

The author's model achieved SOTA results on 8 commonly used NER benchmarks. And through further empirical analysis, it is shown that boundary smoothing effectively alleviates neural model overconfidence, improves model calibration, and leads to smoother model predictions.

2. Introduction

Recently, cross-domain based models have gained popularity in NER research and achieved state-of-the-art (SOTA) results. This method usually enumerates all candidate spans. Since annotated spans in a sentence are scarce, this often leads to overfitting, so usually annotated spans have a confidence close to 1, while the rest of the candidates The span confidence is 0. The sharpness of such apparent adjacent spans affects the trainability of neural networks. Furthermore, empirical evidence shows that these models are prone to the over-confidence problem, i.e. predicting entities with much higher confidence than their proper probability of correctness. This is a manifestation of calibration errors (Guo et al., 2017).

Inspired by label smoothing (Szegedy et al., 2016; Muller et al., 2019), the authors propose boundary smoothing as a regularization technique for span-based neural NER models. The problem of model overconfidence is alleviated by reassigning probabilities around the boundaries of labeled entities. In addition, the authors also demonstrate that boundary smoothing can help the trained NER model maintain calibration, so that the resulting confidence can better represent the predicted entity. accuracy and better generalization.

3. Method

3.1 Biaffine Decoder

The decoder under the span NER framework predicts the start and end positions of the text sequence span, and is used to connect the matrix of the predicted hs and he positions, using the double diffraction matrix Biaffine to achieve.

insert image description here

3.2 Boundary Smoothing

Given an annotated entity, part of the confidence probability θ is assigned to its surrounding spans, and the remaining probability 1−θ is assigned to the initially annotated span. When the smoothing size is D, all spans of Manhattan distance D (D≤D) share the average probability θ/D. After that, the remaining spans are assigned the probability of being "non-entity", which we call a smooth boundary.

insert image description here

4. Experiment & Results

4.1 Dataset

  • 4 English NER datasets

    • CoNLL 2003
    • Onto Notes 5
    • ACE 2004
    • ACE 2005
  • 4 Chinese datasets

    • Onto Notes 4
    • MSRA
    • Weibo NER
    • Resume NER

    where ACE 2004 and ACE 2005 are nested NER tasks

4.2 Parameter setting

  • English uses Roberta (768,12 Layers) +BiLSTM
  • Chinese uses BERT-wwm (768,12 Layers)+BiLSTM
  • BiLSTM: one layer, hidden size=200, dropout_rate=0.5
  • Baffine decoder:hidden size=150,dropout_rate=0.2
  • boundary smoothing parameter e {0.1,0.2,0.3}
  • smoothing size D {1,2}
  • AdamW optimizer, gradient clipping
  • Training 50epochs
  • batch-size 48
  • lr 1e-3 and 3e-3
  • Randomly initialize weights
  • Linear warmup first 20% steps
  • f1 as an evaluation index

4.3 Experiment

4.3.1 Baseline settings:
  • English: roberta-base+bilstm+biaffine
  • Chinese: Bert+Bilstm+biaffine
4.3.2 Results

insert image description here

4.3.3 Ablation experiment

Performed on CoNLL2003, ACE2005 and Resume NER.

insert image description here

4.4 Confidence and Entity Calibration

To formally investigate overconfidence, we plot reliability maps and calculate expected calibration errors (ECE). Briefly, for the NER model, we group all predicted entities into ten bins by their associated confidence, and then compute the accuracy for each bin. If the model is well calibrated, the precision rate should be close to the confidence level for each bin.

insert image description here

V. Summary

Based on simple but powerful baselines, our model achieves SOTA results on eight well-known NER benchmarks, including English and Chinese, flat and nested NER tasks. Furthermore, experimental results show that boundary smoothing leads to less overconfidence, better model calibration, flatter neural minima, and smoother loss landscapes. These properties reasonably explain the performance improvement.

Our findings shed light on the role of smooth regularization techniques in NER tasks. As discussed, boundary smoothing generally increases the overall F1 score, but recall may decrease slightly; therefore, it may be used with caution for recall-sensitive applications. Future work will apply boundary smoothing to more variants of span-based NER models and investigate its effects in a wider range of information extraction tasks.

6. Personal thinking

The methods and experiments in this paper are very simple. Based on the baffine ner model, the label embedding method is added. But it is such a simple combination, combined with the author's description and demonstration of the concept of confidence, which strongly proves his point of view.

Guess you like

Origin blog.csdn.net/be_humble/article/details/128327031