【李宏毅2022 机器学习春】hw7_BERT(占坑)

评分标准

在这里插入图片描述

实验记录

medium

Hyper parameter: max_question 40/ max_paragraph 350/ doc_stride 300

##### TODO: Apply linear learning rate decay #####
learning_rate = learning_rate * (1.0 / (1.0 + 0.00001 * step))

绘制的lr曲线:
在这里插入图片描述

训练技巧

fp16_training

在这里插入图片描述

官方示例:
在这里插入图片描述

Gradient accumulation

from: https://kozodoi.me/python/deep%20learning/pytorch/tutorial/2021/02/19/gradient-accumulation.html

# batch accumulation parameter
accum_iter = 4  

# loop through enumaretad batches
for batch_idx, (inputs, labels) in enumerate(data_loader):

    # extract inputs and labels
    inputs = inputs.to(device)
    labels = labels.to(device)

    # passes and weights update
    with torch.set_grad_enabled(True):
        
        # forward pass 
        preds = model(inputs)
        loss  = criterion(preds, labels)

        # normalize loss to account for batch accumulation
        loss = loss / accum_iter 

        # backward pass
        loss.backward()

        # weights update
        if ((batch_idx + 1) % accum_iter == 0) or (batch_idx + 1 == len(data_loader)):
            optimizer.step()
            optimizer.zero_grad()

收获:

训练得更快:fp16_training
更大的batch:Gradient accumulation

线性学习率下降:

from transformers import get_linear_schedule_with_warmup # 在https://huggingface.co/transformers/下,不在pytorch官网
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps= 0, # Default value
                                                num_training_steps=total_steps) # 把num_warmup_steps=0就可以实现线性下降

又见到Hugging Face这个库:https://huggingface.co/

猜你喜欢

转载自blog.csdn.net/weixin_43154149/article/details/124417296