Some bug modification methods of the yolov4 project model

Some bugs and methods faced by the model of yolo v4

Project address
https://github.com/Tianxiaomo/pytorch-YOLOv4

yolov4_new all processes
The following are the various problems that may be encountered
Of course, there are still some problems that need to be solved


The training and testing annotation files required by the model are not json, but txt files. The file format is introduced in the above project. The author also provides a code tool/dataset.py to convert yolo to txt

  • According to the converted txt file, about 59 lines of code in cfg.py need to be modified to correspond to the annotation file
  • Note that the training weight file of the model will be saved in checkpoints under base_dir by default
Cfg.train_label = os.path.join(_BASE_DIR, 'train_yolov4_4_355.txt')
Cfg.val_label = os.path.join(_BASE_DIR, 'train_yolov4_4_355.txt')

  • For better modification, we need to create a new yolov4_{dataset}.py folder in the cfg directory and copy cfg/yolov4.cfg
  • ctrl+r Change all classes=80 to classes={your_dataset_classes_nums}
classes=4

  • In cfg.py, you need to modify the code on line 20, otherwise it will report an error dimension problem
    shape [1,3,29,76,76] is invalid for input of size 1472880
Cfg.use_darknet_cfg = False

  • In dataset.py, a new line of code needs to be inserted, about line 141. After hsv is created, it is a tuple and cannot be modified, so it needs to be transferred to list
hsv=list(hsv)
hsv[1] *= dsat
hsv[2] *= dexp
hsv[0] += 179 * dhue

  • In order to add fusion_factor, you only need to add each up variable *0.5 in the forward function of the Neck part in models.py, and up appears twice, so do the following operations twice
up = self.upsample1(x7, downsample4.size(), self.inference) * 0.5

  • After transferring to the om model, it cannot be tested. It is necessary to change the line 400 code Yolov4Head-forward() in models.py as follows (the comment is the original one)
  • It should be noted that the model cannot be tested with pytorch after the code is changed in this step, so please change it before turning to onnx at the end, and then change it back
  • Since pytorch cannot be tested, you must annotate the line 81 of demo_pytorch2onnx.py line - detect(session, image_src, name_file) before switching to onnx
        if self.inference:
            # y1 = self.yolo1(x2)
            # y2 = self.yolo2(x10)
            # y3 = self.yolo3(x18)
            #
            # return get_region_boxes([y1, y2, y3])
            return [x18, x10, x2]
        else:
            return [x2, x10, x18]

  • The previous model was tested every round, which was a waste of time. Modify train.py line 416 or so as follows. The specific modification steps are to put some things to be tested under the if
  • In order to pass the interval parameter to the model, you need to modify train.py get_args(**kwargs)parser.add_argument('-interval', dest='interval', type=int, default=10, help='interval between train and val')
            if (epoch + 1) % config.interval == 0:
                if cfg.use_darknet_cfg:
                    eval_model = Darknet(cfg.cfgfile, inference=True)
                else:
                    eval_model = Yolov4(cfg.pretrained, n_classes=cfg.classes, inference=True)
                # eval_model = Yolov4(yolov4conv137weight=None, n_classes=config.classes, inference=True)
                if torch.cuda.device_count() > 1:
                    eval_model.load_state_dict(model.module.state_dict())
                else:
                    eval_model.load_state_dict(model.state_dict())
                eval_model.to(device)

                evaluator = evaluate(eval_model, val_loader, config, device)
                del eval_model

                stats = evaluator.coco_eval['bbox'].stats
                writer.add_scalar('train/AP', stats[0], global_step)
                writer.add_scalar('train/AP50', stats[1], global_step)
                writer.add_scalar('train/AP75', stats[2], global_step)
                writer.add_scalar('train/AP_small', stats[3], global_step)
                writer.add_scalar('train/AP_medium', stats[4], global_step)
                writer.add_scalar('train/AP_large', stats[5], global_step)
                writer.add_scalar('train/AR1', stats[6], global_step)
                writer.add_scalar('train/AR10', stats[7], global_step)
                writer.add_scalar('train/AR100', stats[8], global_step)
                writer.add_scalar('train/AR_small', stats[9], global_step)
                writer.add_scalar('train/AR_medium', stats[10], global_step)
                writer.add_scalar('train/AR_large', stats[11], global_step)

  • train.py line 211 .contiguous() to make view not lead to error
pred_ious = bboxes_iou(pred[b].contiguous().view(-1, 4), truth_box, xyxy=False)

  • It is recommended to comment out parser.add_argument('-val_label_path', dest='val_label', type=str, default='train.txt') in about 500 lines of get_args(**kwargs) in train.py, so that By default, the training and testing annotations in the yolov4_{dataset}.cfg we modified earlier are used
  • train.py line 326 add val label path so you can see the test set
    logging.info(f'''Starting training:
        Epochs:          {
      
      epochs}
        Batch size:      {
      
      config.batch}
        Subdivisions:    {
      
      config.subdivisions}
        Learning rate:   {
      
      config.learning_rate}
        Training size:   {
      
      n_train}
        Validation size: {
      
      n_val}
        Checkpoints:     {
      
      save_cp}
        Device:          {
      
      device.type}
        Images size:     {
      
      config.width}
        Optimizer:       {
      
      config.TRAIN_OPTIMIZER}
        Dataset classes: {
      
      config.classes}
        Train label path:{
      
      config.train_label}
        
        Val label path:  {
      
      config.val_label}
        
        Pretrained:
    ''')

  • The method of saving the model provided in the code is wrong and requirestrain.py line 454 model.module.state_dict() to save torch.nn.DataParallel model
                if isinstance(model, torch.nn.DataParallel):
                    torch.save(model.module.state_dict(), save_path)
                else:
                    torch.save(model.state_dict(), save_path)

  • In the train.py main function, add a code to read the weight of the model (can you believe that the author didn't write this stuff?)
  • In order to achieve this function, you need to add in get_args(**kwargs) parser.add_argument('-f', '--load', dest='load', type=str, default=None, help='Load model from a .pth file')
    if cfg.use_darknet_cfg:
        model = Darknet(cfg.cfgfile)
    else:
        model = Yolov4(cfg.pretrained, n_classes=cfg.classes)
	####################### 就是这er
    if cfg.load is not None:
        model.load_state_dict(torch.load(cfg.load))


  • Remember to create a new environment, the mmdetection I used before reported an error cuDDN error
  • How to configure the environment will not be written, pip install -r requirements.txt, what is missing for other errors

  • Remember, in any case, make a small data set (40 images) to test whether your code can run training and testing, don't run a round of training, only to find that there is a bug in the test

  • My own gta dataset model results, for reference only.

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.107
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.269
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.057
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.032
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.106
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.151
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.151
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

Training configuration:

python train.py -l 0.001 -g 0,1 -pretrained pth/yolov4.conv.137.pth -classes 4 -dir . -interval 10

The project address of mmdeploy, some introductions to Zhihu

Project address (there is an introduction document in the project address),
but most of them are converted into tensorrt, etc., rather than converted into an intermediate format of onnx
github.com/open-mmlab/mmdeploy

The method of converting to onnx intermediate format is:
https://github.com/open-mmlab/mmdeploy/blob/master/tools/torch2onnx.py
There is such a code:
parser.add_argument('deploy_cfg', help='deploy config path') This paragraph needs to be input in the file mmdeploy
in configs needs to convert the target model environment (for example, the corresponding environment is required to convert the mmdetection model)

Introduction and Interpretation of Zhihu (a total of six articles, which are very detailed)
https://zhuanlan.zhihu.com/p/486914187

Guess you like

Origin blog.csdn.net/fei_YuHuo/article/details/126715254