YOLOV5 analysis

The network consists of three main components:
1) Backbone: a convolutional neural network that aggregates and forms image features at different image granularities.
2) Neck: A series of network layers that mix and combine image features and pass the image features to the prediction layer.
3) Output: Predict image features, generate bounding boxes and predict categories.
For YOLOV5, whether it is V5s, V5m, V5l or V5x, the Backbone, Neck and output are the same. The only difference is with the model's depth and width settings.
General structural framework:
Insert image description here
Let’s analyze them one by one:
1) Backbone
starts with the code, and there is a general outline:

# YOLOv5 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, BottleneckCSP, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, BottleneckCSP, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, BottleneckCSP, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, BottleneckCSP, [1024, False]],  # 9
  ]

① The first layer of Backbone focus, periodically extracts pixels from the high-resolution image and reconstructs them into the low-resolution image, that is, stacking four adjacent positions of the image, focusing the wh-dimensional information into the c-channel space, improving each point receptive field and reduce the loss of original information. The design of this module is mainly to reduce the amount of calculation and speed up.
Author's original words: Focus() module is designed for FLOPS reduction and speed increase, not mAP increase.

class Focus(nn.Module):
    # Focus wh information into c-space
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super(Focus, self).__init__()
        self.conv = Conv(c1 * 4, c2, k, s, p, g, act)

    def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)
        return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))

Insert image description here
The data flow direction on YOLOv5 is:
YOLO V5 defaults to 3x640x640 input. First copy four copies, and then cut the four pictures into four 3x320x320 slices through the slicing operation. Next, use concat to connect the four slices in depth. , the output is 12x320x320, and then passes through a convolution layer with a convolution kernel number of 64 to generate an output of 64x320x320. Finally, the result is input to the next convolution layer through batch_borm and leaky_relu.
② The third layer of Backbone, BottleneckCSP module.
The BottleneckCSP module mainly includes two parts: Bottleneck and CSP.

class BottleneckCSP(nn.Module):
    # CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super(BottleneckCSP, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)
        self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)
        self.cv4 = Conv(2 * c_, c2, 1, 1)
        self.bn = nn.BatchNorm2d(2 * c_)  # applied to cat(cv2, cv3)
        self.act = nn.LeakyReLU(0.1, inplace=True)
        self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)])

    def forward(self, x):
        y1 = self.cv3(self.m(self.cv1(x)))
        y2 = self.cv2(x)
        return self.cv4(self.act(self.bn(torch.cat((y1, y2), dim=1))))

Insert image description here
Among them, in the model configuration file of yolov5, some BottleneckCSP module configurations have False, and some do not, as follows:
[-1, 9, BottleneckCSP, [512]],

[-1, 3, BottleneckCSP, [512, False]], # 13

False here means that shortcut operations are not performed in this BottleneckCSP.
For example:
the left picture below is BottleneckCSP without False, and the right picture is BottleneckCSP with False.
Insert image description here
④ The SPP module (spatial pyramid pooling module) adopts the maximum pooling of 5/9/13 respectively, and then conducts concat fusion to increase the receptive field.
The input of SPP is 512x20x20. After passing through the 1x1 convolution layer, the output is 256x20x20. Then it is down-sampled through three parallel Maxpools. The result is added to its initial features and the output is 1024x20x20. Finally, a 512 convolution kernel is used to restore it to 512x20x20.
Insert image description here

2) Neck (PANet)
PANET is based on the Mask R-CNN and FPN framework, which enhances information dissemination and has the ability to accurately retain spatial information, which helps to appropriately position pixels to form a mask.
3) Loss function
Border regression: CIOU (an improvement of GIOU)
Objectness: GIOU
Insert image description here
IOU: Intersection and union ratio
GIOU formula means: first calculate the minimum closure area Ac of the two boxes (popular understanding: includes both the predicted box and the real The area of ​​the smallest box of the box), then calculate the IoU, then calculate the proportion of the area in the closure area Ac that does not belong to the two boxes to the closure area, and finally use IoU to subtract this proportion to obtain GIoU.
As the loss function:
Insert image description here
classification: BCE (cross entropy loss)
loss balance: ciou=0.05, giou=1, bce=0.5
training
1; environment
ubuntu, python3.8, torch 1.6, torchvision 0.7
2; sent to the network for training The data format
Insert image description here
contains 5 floating-point data in each line. The first one is the label serial number, the second and third are the center point coordinates after normalization of the target, and the fourth and fifth represent the normalized coordinates of the target. length and width.
3: Parameter description in the configuration file

train_path: ./source_data/traindata  #训练数据集
val_path: ./source_data/valdata   #测试数据集

convertor_path: ./convertor/chouyan   #转换成训练模型所需要的的数据格式保存路径

task_name: chouyan_s    #任务名称,最后在这个文件夹下保存生成的模型

names: ["xiangyan"]    #类别名称


gpu_ids: "2"
imgsz: 416
epochs: 50     #训练总共跑50个epoch
batch_size: 4
eval_interval: 5    #每5个epoch保存一次模型

weights: ./weights/yolov5s.pt     #预训练模型

#weights: /data/dj/yolov5/work_dir/chouyan_s/2020-11-14/2020-11-14_15:17:32/epoch_15.pth   # 在测试的时候,指定#测试的模型

source: ./test_data/chouyan2/neg   #测试的数据集路径,只需要图像

output_pos: output_s_pos_neg    #测试结果保存的路径,可根据自己需要保存

3: Things to pay attention to in the training program
1) Configuration parameters

if __name__ == '__main__':
	parser = argparse.ArgumentParser()
	parser.add_argument('--cfg', type=str, default='models/yolov5s.yaml', help='model.yaml path') #如果预训练模型是#yolov5s,这里需要保持一致,最终训练的模型就是yolov5s版本
	parser.add_argument('--data', type=str, default='config.yaml', help='data.yaml path')  # 配置文件名称
	parser.add_argument('--hyp', type=str, default='', help='hyp.yaml path (optional)')
	parser.add_argument('--epochs', type=int, default=300)
	parser.add_argument('--batch-size', type=int, default=16, help="Total batch size for all gpus.")
	parser.add_argument('--img-size', nargs='+', type=int, default=[416, 416], help='train,test sizes')  #图像resize到的尺寸,可根据自己实际任务需求,改成640等,需要32的倍数
	parser.add_argument('--rect', action='store_true', help='rectangular training')
	parser.add_argument('--resume', nargs='?', const='get_last', default=False,
						help='resume from given path/to/last.pt, or most recent run if blank.')
	parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
	parser.add_argument('--notest', action='store_true', help='only test final epoch')
	parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')
	parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters')
	parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
	parser.add_argument('--cache-images', action='store_true', help='cache images for faster training')
	parser.add_argument('--weights', type=str, default='', help='initial weights path')
	parser.add_argument('--name', default='', help='renames results.txt to results_name.txt if supplied')
	parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
	parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
	parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
	parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
	parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')
	opt = parser.parse_args()

2) Model storage path

def init_logger(work_dir='./work_dir'):    #在ubuntu下训练一定不要写成work_dir='.\\work_dir',不然会找不到这个路径
	cur_time = datetime.datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
	work_dir = os.path.join(work_dir, cur_time.split('_')[0], cur_time)

	mkdir_or_exist(os.path.abspath(work_dir))

	# log
	log_file = os.path.join(work_dir, 'log.log')
	logger = get_root_logger(log_file)
	return logger, work_dir

4. Finally, run python train.py directly to train directly.
Test:
Run python detect.py directly to train directly.
Comparison of YOLOv5 three models
Insert image description here
training code review
1: Load training configuration file

2: Create model
① Import the model configuration file
self.yaml = yaml.load(f, Loader=yaml.FullLoader)
② According to the model configuration file, define the model
if nc and nc != self.yaml['nc']:
self. yaml['nc'] = nc # override yaml value
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch])

 from  n    params  module                                  arguments                     
  0                -1  1      8800  models.common.Focus                     [3, 80, 3]                    
  1                -1  1    115520  models.common.Conv                      [80, 160, 3, 2]               
  2                -1  1    315680  models.common.BottleneckCSP             [160, 160, 4]                 
  3                -1  1    461440  models.common.Conv                      [160, 320, 3, 2]              
  4                -1  1   3311680  models.common.BottleneckCSP             [320, 320, 12]                
  5                -1  1   1844480  models.common.Conv                      [320, 640, 3, 2]              
  6                -1  1  13228160  models.common.BottleneckCSP             [640, 640, 12]                
  7                -1  1   7375360  models.common.Conv                      [640, 1280, 3, 2]             
  8                -1  1   4099840  models.common.SPP                       [1280, 1280, [5, 9, 13]]      
  9                -1  1  20087040  models.common.BottleneckCSP             [1280, 1280, 4, False]        
 10                -1  1    820480  models.common.Conv                      [1280, 640, 1, 1]             
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1   5435520  models.common.BottleneckCSP             [1280, 640, 4, False]         
 14                -1  1    205440  models.common.Conv                      [640, 320, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1   1360960  models.common.BottleneckCSP             [640, 320, 4, False]          
 18                -1  1    922240  models.common.Conv                      [320, 320, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1   5025920  models.common.BottleneckCSP             [640, 640, 4, False]          
 21                -1  1   3687680  models.common.Conv                      [640, 640, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1  20087040  models.common.BottleneckCSP             [1280, 1280, 4, False]        
 24      [17, 20, 23]  1     40374  models.yolo.Detect                      [1, [[15.086, 10.896, 12.185, 27.484, 28.191, 13.015], [26.104, 23.376, 53.062, 15.104, 18.795, 51.715], [45.032, 25.534, 31.85, 40.063, 49.745, 47.505]], [320, 640, 1280]]

③ Obtain the forward output
m = self.model[-1] # Detect()
As you can see from the above model structure, yolov5 stores the output results of three scales in the last layer, so to obtain the forward result, directly take The results of the last layer of the model are sufficient.
④ Normalize the configured anchors according to the scale of [8, 16, 32].
m.anchors /= m.stride.view(-1, 1, 1) #m.stride=[8,16,32]
Three: Load the pre-trained model parameters according to the created model structure
Four: Set some training techniques:
①model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
The role of the amp.initialize function is not to improve model accuracy or training speed, but to reduce video memory consumption.
It mainly depends on the configuration of the opt_level parameter. 00 is equivalent to the original single-precision training. 01 uses half precision in most calculations, but all model parameters still maintain single precision. For a few calculations with better single precision (such as softmax), single precision is still maintained. Compared with 01, 02 also changes the model parameters to half precision. 03 is basically equal to the full and half-precision operation of the initial experiment. It is worth mentioning that regardless of whether the model adopts half-precision during the optimization process, the saved model is a single-precision model, which can ensure the normal use of the model in other applications.
② How to adjust the learning rate.
Pytorch has 3 strategies for adjusting the learning rate:
1) Orderly adjustment: equal interval adjustment (Step), adjustment of the learning rate on demand (MultiStep), exponential decay adjustment (Exponential) and CosineAnnealing
2) Adaptive adjustment: ReduceLROnPlateau
3) Custom adjustment: LambdaLR
③Multi-GPU training processing
model = torch.nn.DataParallel(model)
If you call torch.nn.DataParallel, you must set 2 blocks when setting up GPU training Or more than 2 blocks. If you only set up single card training, an error will be reported.
④ Perform an average operation on the pre-model parameters to increase the robustness during model training
ema = torch_utils.ModelEMA(model)
5: Processing of training data set and validation data set
create_dataloader()
6: Set some parameters for the model
including the number of categories, categories Name, category weight, etc.
7: Anchor settings
Do you need to modify the anchor:
check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)
If the anchor size distribution of your own data set is not as rich as the coco data set , for example: small target detection, it is recommended to cluster a new anchor based on your own data set

# anchors:
#   - [10,13, 16,30, 33,23]  # P3/8
#   - [30,61, 62,45, 59,119]  # P4/16
#   - [116,90, 156,198, 373,326]  # P5/32
anchors:
  - [15.086,10.896,12.185,27.484,28.191,13.015]  # P3/8
  - [26.104,23.376,53.062,15.104,18.795,51.715]  # P4/16
  - [45.032,25.534,31.85,40.063,49.745,47.505]  # P5/32

Directly copy kmeans_anchor in utils to local and run it locally.
kmean_anchors(path='E:\projects\YOLO5\data\chouyan.yaml', n=9, img_size=416, thr=8.0, gen=1000, verbose=True) Create a new xxxx.yaml file according to
coco.yaml , set the training set path in it; n=9, which means clustering 9 points, do not change this, img_size training set image size, thr training set target aspect ratio, because the target I need to train is the shape of a small slender bar , then changed thr=4.0 in coco to 8.0 to increase the target aspect ratio.
Eight: Training

for epoch in range(start_epoch, epochs):
	...
	for i, (imgs, targets, paths, _) in enumerate(dataloader):  #分批训练
		...
	imgs = imgs.to(device, non_blocking=True).float() / 255.0 #uint8 to float32
	...
	pred = model(imgs)  #前向处理
	loss, loss_items = compute_loss(pred, targets.to(device), model) #计算loss
	...
	results, correct,maps, times = test.test(...)  #验证集验证
	torch.save(...)  #保存模型

Loss sorting
1) Forward processing results:
Insert image description here
8, 16, 32 output of three scales
② Calculate loss
1) tcls, tbox, indices, anchors = build_targets(p, targets, model)

def build_targets(p, targets, model):  #p:三个尺度的特征图; targets:标签信息,det.anchors:预测的anchors
    # Build targets for compute_loss(), input targets(image,class,x,y,w,h)
    det = model.module.model[-1] if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel) \
        else model.model[-1]  # Detect() module
    na, nt = det.na, targets.shape[0]  # number of anchors, targets
    tcls, tbox, indices, anch = [], [], [], []
    gain = torch.ones(6, device=targets.device)  # normalized to gridspace gain #shape[6]
    off = torch.tensor([[1, 0], [0, 1], [-1, 0], [0, -1]], device=targets.device).float()  # overlap offsets #shape[4,2]
    at = torch.arange(na).view(na, 1).repeat(1, nt)  # anchor tensor, same as .repeat_interleave(nt)  #shape[3,1]

    g = 0.5  # offset
    style = 'rect4'
    for i in range(det.nl):
        anchors = det.anchors[i]   #分别获取每个尺度上的基于特征图大小的的anchor #shape[3,2]
        gain[2:] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain   #shape[6]  获取每个尺度上预测的类别,已经类别置信度,目标cx,cy,w,h

        # Match targets to anchors
        a, t, offsets = [], targets * gain, 0  #将gt的cx,cy,w,h换算到当前特征层对应的尺寸,以便和该层的anchor大小相对应
        if nt:
            r = t[None, :, 4:6] / anchors[:, None]  # wh ratio #t_w,h/anchors
            j = torch.max(r, 1. / r).max(2)[0] < model.hyp['anchor_t']  # compare  #判断了r和1/r与model.hyp['anchor_t']的大小关系,返回bool值
            # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n) = wh_iou(anchors(3,2), gwh(n,2))
            a, t = at[j], t.repeat(na, 1, 1)[j]  # filter  #过滤掉长宽比大于阈值anchor_t的预测长宽组合

            # overlaps 
            gxy = t[:, 2:4]  # grid xy  #获取gt的cx,cy
            z = torch.zeros_like(gxy)
            if style == 'rect2':
                j, k = ((gxy % 1. < g) & (gxy > 1.)).T
                a, t = torch.cat((a, a[j], a[k]), 0), torch.cat((t, t[j], t[k]), 0)
                offsets = torch.cat((z, z[j] + off[0], z[k] + off[1]), 0) * g
            elif style == 'rect4':
                j, k = ((gxy % 1. < g) & (gxy > 1.)).T
                l, m = ((gxy % 1. > (1 - g)) & (gxy < (gain[[2, 3]] - 1.))).T
                a, t = torch.cat((a, a[j], a[k], a[l], a[m]), 0), torch.cat((t, t[j], t[k], t[l], t[m]), 0)
                offsets = torch.cat((z, z[j] + off[0], z[k] + off[1], z[l] + off[2], z[m] + off[3]), 0) * g
            # t.shape=[9,6]  即扩充了gt的数量,由原来3个anchor扩充到现在9个anchor,在每个gt中心点附近再扩充2个gt中心点
        # Define
        b, c = t[:, :2].long().T  # image, class
        gxy = t[:, 2:4]  # grid xy
        gwh = t[:, 4:6]  # grid wh
        gij = (gxy - offsets).long()
        gi, gj = gij.T  # grid xy indices

        # Append
        indices.append((b, a, gj, gi))  # image, anchor, grid indices
        tbox.append(torch.cat((gxy - gij, gwh), 1))  # box
        anch.append(anchors[a])  # anchors
        tcls.append(c)  # class

    return tcls, tbox, indices, anch

Pitfalls during the training process:
Phenomenon: Gradient nan
Possible reasons: serious weight attenuation and small weight coefficient, resulting in too small gradients, exceeding the storage range of Apex, the pytorch acceleration artifact, resulting in rounding errors.
Phenomenon: The loss has always been high, and the recall rate has been very low
. Reason: The initial learning rate is set too high.
Why does the learning rate have such a big impact on model convergence? First, let’s sort out the network parameter update process.
Pytorch model parameter update process:
1. Obtain loss through the error between the network forward output and the real label.
2. Complete the back propagation of the error through loss.backward(), and complete the automatic derivation through the internal mechanism of pytorch to obtain the gradient W_grad of each parameter.

if mixed_precision:
				with amp.scale_loss(loss, optimizer) as scaled_loss:
					scaled_loss.backward()
			else:
				loss.backward()

In a normal program, after obtaining the gradient, the optimization algorithm is started and the weights are updated. But in yolov5, the gradient accumulation trick is used, that is,

if ni % accumulate == 0:
				optimizer.step()
				optimizer.zero_grad()

That is, after calculating the gradient, the optimization algorithm is not started immediately, but the loss is calculated for the next batch, and then the gradient of this batch is calculated based on the original gradient, thus realizing the accumulation of the gradients of multiple batches into one Gradient, based on this accumulated gradient, the model parameters are updated, thus completing a model iteration.
3. Update the model parameters through the optimization algorithm.
① Optimization process:
The general formula of the optimization algorithm is:
W_data=W_data+W_grad*lr
, where W_data is the model parameter, W_grad is the parameter gradient, and lr is the learning rate.

optimizer = optim.SGD(pg0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True)

In the pytorch framework, general optimization algorithms will perform L2 regularization weight attenuation processing on model parameters. After weight attenuation processing, the output weight will become smaller in order to prevent overfitting.

@torch.no_grad()
    def step(self, closure=None):
        """Performs a single optimization step.

        Arguments:
            closure (callable, optional): A closure that reevaluates the model
                and returns the loss.
        """
        loss = None
        if closure is not None:
            with torch.enable_grad():
                loss = closure()

        for group in self.param_groups:
            weight_decay = group['weight_decay']
            momentum = group['momentum']
            dampening = group['dampening']
            nesterov = group['nesterov']

            for p in group['params']:
                if p.grad is None:
                    continue
                d_p = p.grad
                if weight_decay != 0:
                    d_p = d_p.add(p, alpha=weight_decay)  #权值L2正则化衰减
                if momentum != 0:
                    param_state = self.state[p]
                    if 'momentum_buffer' not in param_state:
                        buf = param_state['momentum_buffer'] = torch.clone(d_p).detach()
                    else:
                        buf = param_state['momentum_buffer']
                        buf.mul_(momentum).add_(d_p, alpha=1 - dampening)
                    if nesterov:
                        d_p = d_p.add(buf, alpha=momentum)
                    else:
                        d_p = buf

                p.add_(d_p, alpha=-group['lr'])  #权值更新

        return loss

② Model parameter update:

optimizer.step()

③ Return the gradient to zero.

optimizer.zero_grad()

A gradient updates the model once, and the next round of model updates requires the next round of newly calculated gradients.
4. Mean processing of updated model parameters

ema = torch_utils.ModelEMA(model) if rank in [-1, 0] else None

ema.update(model)

Several ways to optimize the learning rate:
① Warmup
can make the learning rate smaller in the first few epochs or steps of training. Under the small preheating learning rate, the model can slowly become stable. After the model is relatively stable, Then select the preset learning rate for training, so that the model converges faster and the model effect is better.
It helps to slow down the early over-fitting of the mini-batch in the initial stage of the model, maintains the stability of the distribution,
and helps maintain the stability of the deep layers of the model.

if ni <= nw:
				xi = [0, nw]  # x interp
				# model.gr = np.interp(ni, xi, [0.0, 1.0])  # giou loss ratio (obj_loss = 1.0 or giou)
				accumulate = max(1, np.interp(ni, xi, [1, nbs / total_batch_size]).round())
				for j, x in enumerate(optimizer.param_groups):
					# bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0
					x['lr'] = np.interp(ni, xi, [0.1 if j == 2 else 0.0, x['initial_lr'] * lf(epoch)])
					if 'momentum' in x:
						x['momentum'] = np.interp(ni, xi, [0.9, hyp['momentum']])

② Learning rate adjustment scheme The
learning rate adjustment strategies provided by Pytorch are divided into three categories:
a. Orderly adjustment: equal interval adjustment (Step), adjustment of learning rate on demand (MultiStep), exponential decay adjustment (Exponential) and CosineAnnealing .
b. Adaptive adjustment: Adaptively adjust the learning rate ReduceLROnPlateau.
c. Custom adjustment: Custom adjustment of the learning rate LambdaLR.

Guess you like

Origin blog.csdn.net/jiafeier_555/article/details/109052569