预训练之后微调出现的参数一致(总结:模型训练输出一致有模型的参数问题或者模型的结构问题)


今天训练之后发现了相应的问题,即预训练之后微调出现的参数一致。

设想问题1,后面的padding出现问题

inputs = 
{'input_ids': tensor([[   2,  136,    4,  149,  149,   38,  171,    4, 2062,    3,   16,   23,
          148,    4, 8249,    3],
        [   2,   33, 3044,  130,  276,   33,   23,   68,    3,  130,  276,   33,
           23,  215,  216,    3],
        [   2,   16,  624,   33, 1023,  129,   14,  129,    3,   33, 1753,   33,
          265, 1940,    4,    3],
        [   2,  109,  104,    4,    4,   65,   47,   68,   20,    3,  641,   33,
           65,   47,   68,    3],
        [   2,  441,  449,   14,    4,  973,   33,    4,   16,    3,   33,  443,
           16,   10, 1100,    3],
        [   2, 1620,  133,  584,  355,  335,    4,  771,    3,  136,  137,    4,
          335,  469,  771,    3],
        [   2,    1, 6652,  726, 2813,  811, 1903,    4,    3, 1709,    4,  350,
          249, 1180, 6652,    3],
        [   2,   16,   14,   27,  129,    4,    3,   16,    4,  220,    4,    4,
            9,   10,  591,    3],
        [   2,    1,   27,   43,   13,  772,  543,    3,  130,   27,   43,   13,
          772,  543,   79,    3],
        [   2,  908,   33,    4,  443,   16,   15,    3,   33,   14,  443,   16,
           15,    7,  495,    3]], device='cuda:0'), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0'), 'attention_mask': tensor([[True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True, True, True,
         True, True, True, True]], device='cuda:0'), 'labels': tensor([[-100, -100,  137, -100, -100, -100, -100,   33, -100, -100, -100, -100,
         -100,  148,  123, -100],
        [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
         -100, -100, -100, -100],
        [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
         -100, -100,  278, -100],
        [-100, -100, -100,  105,   33, -100, -100, -100, -100, -100, -100, -100,
         -100, -100, -100, -100],
        [-100, -100, -100, -100,  822, -100, -100, 2742, -100, -100, -100, -100,
         -100, -100, -100, -100],
        [-100, -100, -100, -100,  355, -100,  469, -100, -100, -100, -100,  355,
         -100, -100, -100, -100],
        [-100, -100, -100, -100, -100, -100, 1903,  297, -100, -100,  606, -100,
         -100, -100, -100, -100],
        [-100, -100, -100, -100, -100,   62, -100, -100, 5357, -100,   14,   27,
         -100, -100, -100, -100],
        [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,   43, -100,
         -100, -100, -100, -100],
        [-100, -100, -100,   14, -100, -100, -100, -100, -100, -100, -100, -100,
         -100, -100, -100, -100]], device='cuda:0')}

之前我后面的padding填充的为0,而现在我后面的padding填充的值为-100,可能这里会对交叉熵的损失结果造成影响。
注意这里的padding为0与padding为-100有很大的区别,padding为-100的时候,相当于这个位置的概率不进行计算,而padding为0的时候,这个位置使用数值必须为标记0,如果不使用标记0的时候,会计算出相应的交叉熵损失并反向传播,这样预测的内容多了之后,模型就会偏向于预测0数值,最终模型会偏向于无论传入任何数值,模型都会去预测同一结果的参数

感悟2:如果一个模型开始的时候不论输入什么内容,输出的预测参数都一样,则模型在训练的过程中由于调整的梯度一样,会导致训练之后的结果也会大概率一样

感悟3:今天早上用的另外一个config,maxlen=512,结果在匹配的过程中出现position_embedding没有放进去的情况,导致最后的预测结果全部都为同一个标签

おすすめ

転載: blog.csdn.net/znevegiveup1/article/details/120243401