[Deep Learning] Adversarial Training in NLP

        In NLP, confrontation training is often carried out for the embedding layer (including word embedding, position embedding, segment embedding, etc.), the idea is very simple, that is, adding interference to the embedding layer, thereby improving the robustness and generalization ability of the model , the following will explain some common adversarial training algorithms in NLP in combination with specific codes.

1.Fast Gradient Method(FGM)

        The idea of ​​FGM is to add interference in the gradient direction to the word embedding. As for the size of the interference, we can adjust it. The samples after adding interference can be used as additional confrontation samples for training to improve the effect of the model. Since we will perform an additional training after adding interference for each sample during training, the training time will theoretically double after using FGM.

        On the basis of the original training code, FGM mainly adds the following additional operations: adding disturbance to the embedding layer and backing up parameters, calculating the loss after adding disturbance, gradient return to accumulate the gradient after adding disturbance, and restoring the original embedding layer parameter.

1.1 Algorithm process

对于每个x:
  1.计算x的前向loss、反向传播得到梯度
  2.根据embedding矩阵的梯度计算出r,并加到当前embedding上,相当于x+r
  3.计算x+r的前向loss,反向传播得到对抗的梯度,累加到(1)的梯度上
  4.将embedding恢复为(1)时的值
  5.根据(3)的梯度对参数进行更新

  1.2 specific code

import torch
class FGM():
    def __init__(self, model):
        self.model = model
        self.backup = {}

    def attack(self, epsilon=1., emb_name='word_embeddings'):
        # emb_name这个参数要换成你模型中embedding的参数名
        for name, param in self.model.named_parameters():
            if param.requires_grad and emb_name in name:
                #print('增加扰动的对象是', name)
                #print(type(param.grad))
                self.backup[name] = param.data.clone()
                norm = torch.norm(param.grad)
                if norm != 0 and not torch.isnan(norm):
                    r_at = epsilon * param.grad / norm
                    param.data.add_(r_at)

    def restore(self, emb_name='word_embeddings'):
        # emb_name这个参数要换成你模型中embedding的参数名
        for name, param in self.model.named_parameters():
            if param.requires_grad and emb_name in name: 
                assert name in self.backup
                param.data = self.backup[name]
        self.backup = {}

1.3 specific usage

fgm = FGM(model) # (#1)初始化
for batch_input, batch_label in data:
    loss = model(batch_input, batch_label) # 正常训练
    loss.backward() # 反向传播,得到正常的grad
    # 对抗训练
    fgm.attack() # (#2)在embedding上添加对抗扰动
    loss_adv = model(batch_input, batch_label) # (#3)计算含有扰动的对抗样本的loss
    loss_adv.backward() # (#4)反向传播,并在正常的grad基础上,累加对抗训练的梯度
    fgm.restore() # (#5)恢复embedding参数
    # 梯度下降,更新参数
    optimizer.step()
    model.zero_grad()

2.Projected Gradient Descent (PGD

        Project Gradient Descent (PGD) is an iterative attack algorithm. Compared with ordinary FGM, which only does one iteration, PGD does multiple iterations, taking a small step each time, and each iteration will project the disturbance into the specified range. Where r is the disturbance constraint space (a sphere with a radius of r), and the initial point corresponding to the original input sample is the center of the sphere to avoid disturbance beyond the sphere. After multiple iterations, the disturbance is guaranteed to be within a certain range, as shown in the figure below:

 2.1 Algorithm process

对于每个x:
  1.计算x的前向loss、反向传播得到梯度并备份
  对于每步t:
    2.根据embedding矩阵的梯度计算出r,并加到当前embedding上,相当于x+r(超出范围则投影回epsilon内)
    3.t不是最后一步: 将梯度归0,根据1的x+r计算前后向并得到梯度
    4.t是最后一步: 恢复(1)的梯度,计算最后的x+r并将梯度累加到(1)上
  5.将embedding恢复为(1)时的值
  6.根据(4)的梯度对参数进行更新

 2.2 specific code

import torch
class PGD():
    def __init__(self, model):
        self.model = model
        self.emb_backup = {}
        self.grad_backup = {}
 
    def attack(self, epsilon=1., alpha=0.3, emb_name='word_embeddings', is_first_attack=False):
        for name, param in self.model.named_parameters():
            if param.requires_grad and emb_name in name:
                if is_first_attack:
                    self.emb_backup[name] = param.data.clone()
                norm = torch.norm(param.grad)
                if norm != 0 and not torch.isnan(norm):
                    r_at = alpha * param.grad / norm
                    param.data.add_(r_at)
                    param.data = self.project(name, param.data, epsilon)
 
    def restore(self, emb_name='word_embeddings'):
        for name, param in self.model.named_parameters():
            if param.requires_grad and emb_name in name: 
                assert name in self.emb_backup
                param.data = self.emb_backup[name]
        self.emb_backup = {}
 
    def project(self, param_name, param_data, epsilon):
        r = param_data - self.emb_backup[param_name]
        if torch.norm(r) > epsilon:
            r = epsilon * r / torch.norm(r)
        return self.emb_backup[param_name] + r
 
    def backup_grad(self):
        for name, param in self.model.named_parameters():
            if param.requires_grad:
                self.grad_backup[name] = param.grad.clone()
 
    def restore_grad(self):
        for name, param in self.model.named_parameters():
            if param.requires_grad:
                param.grad = self.grad_backup[name]

2.3 specific usage

pgd = PGD(model)
K = 3
for batch_input, batch_label in data:
    # 正常训练
    loss = model(batch_input, batch_label)
    loss.backward() # 反向传播,得到正常的grad
    pgd.backup_grad()
    # 累积多次对抗训练——每次生成对抗样本后,进行一次对抗训练,并不断累积梯度
    for t in range(K):
        pgd.attack(is_first_attack=(t==0)) # 在embedding上添加对抗扰动, first attack时备份param.data
        if t != K-1:
            model.zero_grad()
        else:
            pgd.restore_grad()
        loss_adv = model(batch_input, batch_label)
        loss_adv.backward() # 反向传播,并在正常的grad基础上,累加对抗训练的梯度
    pgd.restore() # 恢复embedding参数
    # 梯度下降,更新参数
    optimizer.step()
    model.zero_grad()

Reference:

1. Adversarial training in NLP_colourmind's Blog-CSDN Blog

2. [NLP] Adversarial Training in NLP_Demeanor 78's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/weixin_44750512/article/details/132088186