In the text summarization project, the loss is masked. If the loss is 0, it means that no gradient update is performed. How to deal with <PAD> after padding?

In the text summarization project, the loss is masked. If the loss is 0, it means that no gradient update is performed. How to deal with it after padding?

code show as below:

def loss_function(pred, real):  # [64,40,32217], [64, 40]

    # 求pad unk的索引值
    pad_index = word_to_id['<PAD>']  # pad_index = 40
    unk_index = word_to_id['<UNK>']  # pad_index = 42

    # 根据真实标签 求pad_mask unk_mask
    pad_mask = torch.eq(real, pad_index)  # [64, 40] --> [[False, False, ....]]
    unk_mask = torch.eq(real, unk_index)  # [64, 40] --> [[False, False, ....]]
    # 求mask矩阵 有效数据1 无效数据0, (取反后0的位置就对应pad或者unk 的位置)
    mask = torch.logical_not(torch.logical_or(pad_mask, unk_mask))  # mask:[64,40] [[True, True, ...False...], ...]

    # 计算损失 对pred转置 N C放前头
    pred2 = pred.transpose(2, 1)  # [64,40,32217] ---> [64,32217,40]
    loss_ = criterion(pred2, real)  # [64,32217,40], [64,40] 注意细节

    # 对pad unk位置产生的损失 mask
    loss_ = loss_ * mask

    # 计算批次平均损失
    len = mask.sum()  # 对mask矩阵sum 相当于求所有有效单词的个数
    loss = torch.sum(loss_) / len

    # 返回批次平均损失
    return loss

Loss mask for pad unk position

loss_ = loss_ * mask

The mask is a matrix of the same shape as the loss composed of 0 and 1. loss_ = loss_ * mask means to cover it with the calculated loss, and cover the loss value corresponding to '' and '', so that it corresponds to the position loss A value of 0 means that this part does not update the gradient.

Why does the loss value of the corresponding position be 0, which means that this part does not update the gradient?

原因:
loss = △ y = y ^ − y loss = \triangle y = \hat{y} - yloss=y=y^y
k = △ y / △ x = 0 k = \triangle y / \triangle x = 0 k=y/△x=0
loss value is 0, which means△ y = 0 \triangle y = 0y=0 , afterw = w =w= w − w - w α \alphaα △ y / △ x = w \triangle y / \triangle x = w y/△x=w
so the weight is not updated.


Of course, when using this feature, the following code must be set:

# 使用reduction=none形式交叉熵算 不使用默认计算均值 原因:需要手工屏蔽pad位置产生的损失
criterion = nn.CrossEntropyLoss(reduction='none')

Guess you like

Origin blog.csdn.net/wtl1992/article/details/131607789