PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Very recently, Microsoft released a parameter amount up to 17 billion in the history of the greatest natural language generation model Turing-NLG , on the model of multilingual reference are to achieve a SOTA.


Here Insert Picture Description

Concern is Turing-NLG reflected in the text of the summary, because it has been very good at understanding the text, and therefore does not require much of paired data can achieve better results than the existing model.


Here Insert Picture Description

From BART Facebook, Google is Microsoft's PEGASUS today's Truing-NLG, increasing pre-training data set and increasing the capacity of Transformer model, as well as new pre-training method for pre-training model in the natural language enhance the effect of the processing tasks very obvious, but also more money to prove that nice ah ~

Source: Turing-NLG: a 17-the Parameter of Billion-by in the Microsoft Language Model


PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization


Here Insert Picture Description

PEGASUS (Extracted with the Pre-Training Gap-Sentences for
Abstractive Summarization) is a new model of automatic text summarization and Google Brain Imperial raised. PEGASUS also based Transformer model building, and a new self-supervised pre-training goals for specific text summary of the task itself GSG (Gap Sentences Generation), finally proved by experiments were implemented on 12 text summary data sets at the time of SOTA, and the authors note that in the case of low resources can also achieve good results under.

文本摘要作为NLP中一项很基础同时很重要的任务吸引了大量研究人员的目光,这几年借助深度神经网络的东风,不同的研究者和机构纷纷提出了不同的解决思路和建模技巧。尽管相关的文章有很多,但是真正的可以推进这项任务的并没有多少,更多的是相似任务模型的迁移使用或是一些小的建模技巧。

有关文本摘要的相关文章可浏览之前的诸多博文

不管是BERT、BART或是PEGASUS来解决文本摘要这项任务,模型最重要的就是对于文本的理解能力。因此,BERT使用了MLM(Masked Language Model)和PNS(Predict Next Sentence)的训练目标来进行预训练;BART采用类似于去噪自编码器的方式进行建模


Here Insert Picture Description

在文中所提及的CNN/DailyMail和XSum两个数据集上得到了比以往模型更好的效果。


Here Insert Picture Description

而本文提出了GSG这个新颖且更加针对于文本摘要的预训练目标进一步的提升了预训练模型在这项任务上的有效性和优异性。

本文的贡献主要为以下四个方面:

  • 提出了一种新的针对于生成时文本摘要的自监督预训练目标
  • 在12个文本摘要数据集上进行实验证明了预训练目标的有效性
  • 通过实验证明了模型在低资源情形下同样可以有不错的效果

精读全文可以看出,本文最核心的内容便是预训练目标GSG,它通过一种新的学习方式可以更好的增强对于文本的学习能力。GSG是一种Sequence-to-Sequence的自监督目标,它的核心思想如下所示:


Here Insert Picture Description

GSG将MASK信息的层级提高句子,为了更好的让模型理解文本,首先选择文档中重要的句子使用[MASK1]进行替换,然后在其余的句子中按照和BERT一样的MASK策略进行处理。80%的情况下使用[KASK2]进行替换,10%使用随机的token进行替换,10%的情况下不做改变。而重要性句子的选择作者试验了以下三种方式:

  • Random:均匀的随机选择m个句子

  • Lead:选择前m个句子

  • Principal:由于本身不存在针对于句子重要性的标签,因此这里使用ROUGE-F1分数做为代理表示,分数越高的句子表示重要性越高。 s i = r O in g e ( x i , D   { x i } ) , i s_{i} = rouge(x_{i}, D\ \{x_{i}\}), \forall{i}
    对应的算法伪代码表示:


    Here Insert Picture Description

    文中作者也给出了一个示例来直观的理解三种选择方式的不同。


    Here Insert Picture Description

实验部分作者做了很多的工作,基本上所有可用的大型数据集都用到了,同时做为对比的基准模型也包含了最近发表的各种预训练模型。

  • 两种PEGASUS模型对于之前的SOTA模型的结果:

    Here Insert Picture Description
  • PEGASUSLARGE与XSum、CNN/DailyMail和Gigaword上其他预训练模型的比较
    Here Insert Picture Description
  • 低资源情形下,,模型在有限的标注样本时fine-tune的结果

    Here Insert Picture Description

The article further confirms the amount of data and models walk the heap capacity to enhance the effect of the model has a big role, but more applicable to specific areas of the pre-training goals is also important to enhance the overall effect of the model. Therefore, when we design different models, in addition to select the appropriate model architecture is more important to consider how to use different ways to enhance our ability to understand the text model.

Published 267 original articles · won praise 91 · views 190 000 +

Guess you like

Origin blog.csdn.net/Forlogen/article/details/104271374