Improving the Robustness of Question Answering Systems to Question Paraphrasing

《Improving the Robustness of Question Answering Systems to Question Paraphrasing》

新加坡国立大学

这篇论文主要是解决模型的鲁棒性,提出了两个测试集,最后通过实验证明(数据增强),能够在提出的两个数据集上得到比较好的结果。

动机:

在这里插入图片描述
在这里插入图片描述

方法:

训练一个模型 Paraphrase-Guided Paraphrasing Network
(source question + paraphrase suggestion(手动构造的) -> target question(输出))

下图是模型输入:预处理成图所示
在这里插入图片描述
训练语料:
1、WikiAnswers paraphrase corpus:22 million question pairs , select 350,000 question pairs在这里插入图片描述
2、Quora Question Pairs dataset :280,000 training examples
在这里插入图片描述
训练好模型之后,如何在SQuAD上生成对抗样本呢?

1. all n-grams (up to 6-grams) from the source question and remove unigrams that are stopwords. 
2. search the paraphrase database PPDB for paraphrases of the remaining n-grams with equivalence score above 0.25
3. A set of paraphrase suggestions for the model to generate paraphrased questions.
4. use the pretrained model by Wieting and Gimpel (2018) to obtain paraphrase similarity score

在这里插入图片描述
A total of 1,062 paraphrased questions are produced.

人工评测:
78.1% of the generated paraphrases are judged to be semantically equivalent and 78.6% are judged to be fluent

下图是一个例子:
在这里插入图片描述

对于干扰性问题:

by using words in the context near a wrong answer candidate of the same type to
generate a natural adversarial example

We perform such paraphrasing manually by going through question and context pairs from the
SQuAD development set and re-writing the question

We create a total of 56 paraphrased questions

在这里插入图片描述

在这里插入图片描述

微调:

1、Non-Adversarial Paraphrased Test Set
用上述的方法:无人工参与
similarity score above 0.9,to create more diverse paraphrased questions as training data
We randomly sample 25,000 paraphrased questions to be used as additional training data.

2、Adversarial Paraphrased Test Set
• We use Flair6 (Akbik et al., 2018) trained on the Ontonotes dataset[8] which contains 12 named entity classes to label which named entity class
• Extract sentences from the context containing named entities of the same type
• 语法分析得到的名词、动词短语,使其形成 paraphrase suggestions并且每一个suggestio 都至少包含两个词,且不和答案重置
• using the paraphrasing model to paraphrase questions
• paraphrase similarity score above 0.83;we want to allow context words that could be very
different from the question words to appear in the generated paraphrase
additional 25,000 paraphrased training examples.

在这里插入图片描述
通过使用这篇论文数据处理的方法。得到了鲁棒性训练集,并且将这部分的数据通过数据增强的方式对模型重新训练。可以看到Table 3, 4, 5, 6都有了比较大的提升,尤其对于干扰性样本来说。

下面是有些相关工作,可供大家参考:
Adversarial Examples for Question Answering
Jia and Liang[1] :
appending a distracting sentence to the end of a passage.
缺点:
the adversarial examples created are unnatural and not
expected to be present in naturally occurring passages.

Some previous work used question paraphrasing to create more natural adversarial examples.
Ribeiro[2]:
use of back translation to obtain paraphrasing rules.
Rychalska[3]:
replaced the most important question word with a synonym from WordNet and ELMo embeddings

Neural Paraphrasing Networks

Besides single paraphrase generation, the value of generating multiple paraphrases.
Gupta[5] a variational autoencoder (VAE)

Xu[6] assumed that different paraphrasing styles used different rewriting patterns, which were represented as latent embeddings. These embeddings were used to augment the decoder’s hidden state to generate different paraphrases.

这篇文章方法:
A more guided approach to generate diverse paraphrases,
Given k suggestions, our model is thus able to generate up to k paraphrased questions.

Paraphrasing as an Intermediate Task to Question Answering
Dong:[7]
The probability distribution of the answer was then generated for each paraphrased question, which was subsequently weighted by the score of each paraphrased question to compute the overall conditional probability
时间消耗比较大
这篇文章 we consider question paraphrasing as a separate task

reference

[1]Adversarial examples for evaluating reading comprehension systems.EMNLP,2017
[2] Semantically equivalent adversarial rules for debugging NLP models. ACL,2018
[3] Are you tough enough? Framework for robustness validation of machine comprehension
Systems.
[4] Learning to paraphrase for question answering. EMNLP,2017
[5]A deep generative framework for paraphrase generation. AAAI,2018
[6]D-PAGE: Diverse paraphrase generation 2018
[7] Learning to paraphrase for question answering. EMNLP,2017
[8] https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf

猜你喜欢

转载自blog.csdn.net/ganxiwu9686/article/details/105932061