【YOLO】yolov5’s training strategy 1 – training warmup

1. What is training warm-up?

    As we all know, the learning rate is a very important hyperparameter, which directly affects the speed and convergence of network training. Normally, before the network starts training, we will randomly initialize the weights. Setting the learning rate too high will lead to serious model oscillation, the learning rate is too small, and the network converges too slowly.
    So what should we do at this time? Do some people say that I set the learning rate smaller for the first dozens or hundreds of epochs, and then set it larger when it normalizes? Yes, this is the simplest Warmup.
   We can think of the Warmup process as a model. The model starts out as a child. If the learning rate is too high, it will be easy for him to understand things too absolutely. At this time, a small learning rate is needed. He can cross the river by feeling the stones and learn carefully. When he has a certain understanding of things, You must understand and accumulate, and your cognition has reached a certain level. At this time, it will be no problem to take a bigger step.

2. Common training warm-ups

1. Constant Warmup

In the first 100 epochs, the learning rate increases linearly and remains unchanged after 100 epochs. The whole process is as follows:
Insert image description here

2. Liners Warmup

In the first 100 epochs, the learning rate increases linearly, and continues to decrease linearly after 100 epochs. The whole process is as follows:
Insert image description here

2. Cosine Warmup

In the first 100 epochs, the learning rate increases linearly, and after 100 epochs, it keeps decreasing in an x-cosine manner. The whole process is as follows:
Insert image description here

3. Training warm-up code of yolov5

    nb represents the number of batches divided into the training set, for example, nb=60, and warmup_epochs=3 in the hyperparameter, then nw = 3 * 60= 180 batch iterations, which is 3 epochs.
    This means that the first three epochs are all in the warm-up phase.
   It should be noted that the warm-up training in the code requires at least 100 batch iterations, otherwise the exercise will end before the warm-up is completed.

# number of batches 数据集一共划分的批次
nb = len(train_loader)
# number of warmup iterations 热身的批次迭代次数, max(3 epochs, 100 iterations)
nw = max(round(self.hyp['warmup_epochs'] * nb), 100)

# 训练热身
for i, (imgs, targets, ...) in train_loader:
   ni = i + nb * epoch  # number integrated batches (since train start) 第几批次
   self.warmup(epoch, ni, nw)  # Warmup 热身阶段

# 热身函数
def warmup(self, epoch, ni, nw):
    """ 
    训练热身(前nw次迭代中)
    在前nw次迭代中, 根据以下计算获取accumulate、lr、momentum
    """
    if ni <= nw:
        xi = [0, nw]  # x interp
        self.accumulate = max(1, np.interp(ni, xi, [1, self.nbs / self.batch_size]).round())
        for j, x in enumerate(self.optimizer.param_groups):
            """
                bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0
                bias的学习率从warmup_bias_lr=0.1下降到lr0
                其他参数的学习率从0.0增长到lr0
                动量momentum从warmup_momentum=0.8变化到hyp momentum=0.937
                """
                fp = [self.hyp['warmup_bias_lr'] if j == 0 else 0.0, x['initial_lr'] * self.lr_lambda(epoch)]
                x['lr'] = np.interp(ni, xi, fp)
                if 'momentum' in x:
                    fp = [self.hyp['warmup_momentum'], self.hyp['momentum']]
                    x['momentum'] = np.interp(ni, xi, fp)

Like, follow, collect, comment

Guess you like

Origin blog.csdn.net/qq_21386397/article/details/131697976