Papers Read | Universal Adversarial Triggers for Attacking and Analyzing NLP

[code] [blog]

The main ideas and contributions

Previously, NLP in general are against attack, then they enter is valid for any specific input it?

Herein search trigger general antagonism: with independent input token sequence , when connected to any input from the data set, the token sequence model generating trigger specific prediction

For example, a trigger lead SNLI implied accuracy decreased from 89.94% to 0.55%, and 72% of the "why" questions to answer "kill Americans" in SQuAD rather gps -2 language will be output model even in non-ethnic background racism.

This design of a guide based on a gradient search token . Search iteratively update trigger sequence tags, to increase the likelihood target bulk sample prediction (Section 2). We found that when the text classification, conditional text and reading comprehension generated inputs are connected together, a short sequence successfully triggered target prediction.

E.g:

 

Common trigger confrontation

 

White-box method does not require the target model.

Finally, common assault is a unique model analysis tool because, unlike a typical attack, they are context-free. Therefore, they are highlighted by the model to study the general input - output mode. We use it to study the impact of data collection bias, and to determine the model of learning heuristic (Section 6).

Attack Model and objectives

 

 

 

 

 

 Trigger search algorithms

First, select the trigger length: length of more effective, short, more subtle. Subsequently, by repeating the word "the", the sub-word "a" or the character "a" sequence to initialize the flip-flop, and the flip-flop is connected to all front-end / end input.

然后,我们迭代地替换触发器中的令牌,以最小化对批量示例的目标预测的损失。为了确定如何替换当前的令牌,我们不能直接应用计算机视觉中的对抗攻击方法,因为令牌是离散的。相反,我们构建在HotFlip (Ebrahimi et al., 2018b)的基础上,这是一种近似于使用梯度替换标记的效果的方法。为了应用这种方法,将触发器标记tadv嵌入到一个热向量中形成eadv

 

 

 

 

Token替换策略

本文HotFlip策略基于任务loss的线性逼近。更新每一个触发器的token eadvi 最小化loss,一阶泰勒近似:

 

 

 V 词典。后面是每个batch的loss的平均梯度。

使用|V| d维点积可以有效地计算最优e' i,其中d为令牌嵌入的维数(Michel et al., 2019)。对于我们考虑的所有模型,这种蛮力解决方案是微不足道的并行性,并且比运行一个前向传递要廉价。最后,在找到每个eadvi之后,我们将嵌入转换回它们相关联的令牌。图1展示了触发器搜索算法。

我们用波束搜索增强了这种令牌替换策略。beam search

对于触发器中的每个令牌位置,我们考虑公式2中的top-k令牌候选项。从左边的位置到右边的位置搜索,利用当前批次上的光束损耗对每一束光束进行定位和打分。由于计算上的限制(附录A),我们使用较小的光束尺寸,增加它们可以改善我们的结果。

我们还攻击使用字节对编码的上下文化ELMo嵌入和子单词模型。这带来了以前工作中没有处理的挑战,例如,ELMo嵌入根据上下文进行更改;我们还在附录A中描述了处理这些攻击的方法。

 

Guess you like

Origin www.cnblogs.com/shona/p/11546402.html