MMdetection environment configuration, config file parsing and training custom VOC dataset

insert image description here

MMDetection is an open source project launched for target detection tasks. It implements a large number of target detection algorithms based on Pytorch, and encapsulates the processes of data set construction, model building, and training strategies into modules. Through module calls, we A new algorithm can be implemented with a small amount of code, which greatly improves the code reuse rate. This article records how to use MMdetection. It may be more vernacular. If you are a professional, you can go to the following tutorial:
MMDetection Framework Introductory Tutorial
Official Document – ​​config file tutorial

1. Folder structure

Download the code of mmdetection from github, and the directory obtained after decompression is as follows (only the main folder is shown here):

├─mmdetection-master
│  ├─build
│  ├─checkpoints            # 存放断点
│  ├─configs                # 存放配置文件
│  ├─data                   # 存放数据
│  ├─demo
│  ├─dist
│  ├─docker
│  ├─docs
│  ├─mmdet                  # mmdetection的主要源码,包括模型定义之类的
│  ├─requirements
│  ├─resources
│  ├─src
│  ├─tests
│  ├─tools                  # 训练、测试、打印config文件等等主要工具
│  └─work_dirs              # 存放训练日志和训练结果

2. Environment configuration

  • Create an environment and install pytorch:
    conda create --name envName python=3.7
    conda activate envName
    conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
  • Follow the tutorial on the official github to install mmcv:
    pip install -U openmim
    mim install mmcv-full
  • Install mmdet:
    pip install mmdet
    In the past, it was very easy to report errors when installing mmcv, but now basically as long as you install pytorch according to the corresponding version, and then use openmim to install mmcv, basically no errors will be reported. The above command is to configure the environment of python3.7, if it is other python versions, it should work .

3. Model training

The key to mastering the training model using MMdetection is to understand config (configuration file). If you want to train faster rcnn, you only need to configure the configuration file, and then use the following command to train:
python tools/train.py configs/faster_rcnn/faster_rcnn_r101_fpn_2x_towervoc.py
which configs/faster_rcnn/faster_rcnn_r101_fpn_2x_towervoc.pyis the configuration file we need to use for training. All parameter settings required during training are defined in this configuration file.
When using it, try to pay attention to a few points:

  • Try not to modify parameters other than configuration files
  • Do not change the original configuration file, if you want to perform new tasks, create a new configuration file

Because there are many files in the MMdetection project, if you train a certain network and change its original configuration file or parameters in which py file, you may forget it after a while. If you use it next time, other networks also need this Modules are problematic.

ok Next, let's introduce the config file.

1. Naming rules for config files:

{model}_[model setting]_{backbone}_{neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{schedule}_{dataset}
The meaning of each field:

{model}: 模型种类,例如 faster_rcnn, mask_rcnn 等。

[model setting]: 特定的模型,例如 htc 中的without_semantic, reppoints 中的 moment 等。

{backbone}: 主干网络种类例如 r50 (ResNet-50), x101 (ResNeXt-101) 等。

{neck}: Neck 模型的种类包括 fpn, pafpn, nasfpn, c4 等。

[norm_setting]: 默认使用 bn (Batch Normalization),其他指定可以有 gn (Group Normalization), syncbn (Synchronized Batch Normalization) 等。 gn-head/gn-neck 表示 GN 仅应用于网络的 Head 或 Neck, gn-all 表示 GN 用于整个模型, 例如主干网络、Neck 和 Head。

[misc]: 模型中各式各样的设置/插件,例如 dconv、 gcb、 attention、albu、 mstrain 等。

[gpu x batch_per_gpu]:GPU 数量和每个 GPU 的样本数,默认使用 8x2。

{schedule}: 训练方案,选项是 1x、 2x、 20e 等。1x 和 2x 分别代表 12 epoch 和 24 epoch,20e 在级联模型中使用,表示 20 epoch。对于 1x/2x,初始学习率在第 8/16 和第 11/22 epoch 衰减 10 倍;对于 20e ,初始学习率在第 16 和第 19 epoch 衰减 10 倍。

{dataset}:数据集,例如 coco、 cityscapes、 voc_0712、 wider_face 等。
Two, config file content analysis

The config file for each network consists of four parts:

  • model settings
  • dataset settings
  • schedules
  • runtime

The official tutorial given at the beginning of the article has mask rcnndetailed comments written line by line using the configuration file as an example. Here is just a rough record of some of my initial misunderstandings.First of all, you should learn to use a tool tools/misc/print_config.py. The parameters printed by this tool are the parameters that are finally input into the network for training. The usage syntax is:
python tools/misc/print_config.py configs/yolox/yolox_l_8x8_300e_coco.py

1. Inherit initial parameters from _base_
insert image description here
This means inheriting from these when initializing the configuration file base config. These parameters are used by default if they are not redefined later base config. As configs/yolox/yolox_l_8x8_300e_coco.pyan example, the parameters about learning rate scheduling in YOLOX lr_configare initially inherited from configs/_base_/schedules/schedule_1x.py, that is to say, it should be:

lr_config = dict(  
    policy='step',  
    warmup='linear',  
    warmup_iters=500,    # 学习率“热身”,初始学习率为0.001,经过500次迭代达到optimizer中
    warmup_ratio=0.001,  # 定义的lr
    step=[8, 11])

But in the end, it was found that the learning rate schedule printed by print_config is not the case. This is because after the configuration file initially inherited lr_config from the _base_ file, it was modified later:

lr_config = dict(  
    _delete_=True,
    policy='YOLOX',  
    warmup='exp',  
    by_epoch=False,  
    warmup_by_epoch=True,  
    warmup_ratio=1,  
    warmup_iters=5,  # 5 epoch  
    num_last_epochs=num_last_epochs,  
    min_lr_ratio=0.05)

_delete_=TrueIt means to delete the original lr_config inherited from _base_ and replace it with a new set of key-value pairs defined here. If you only modify some parameters, such as only modifying the step, then you don’t need _delete_, just add it in the configuration file:

lr_config = dict(  
    step=[7, 10])

It should be noted that the key-value pairs in the config file are read in order. If you define the same parameter multiple times, the one written later will overwrite the previous one .

2. Automatic adjustment of learning rate
insert image description here
At first, I mistakenly thought that this parameter was to adjust batch_szie. But in fact, the meaning of this parameter is that the learning rate set in this project is based on the situation. If your settings are different, it will automatically adjust your initial learning rate 8 gpus*8 batch_sizebased on this , so don’t change this value. batchsizeDo not change the initial learning rate either.
The place to adjust batch_sizeis here ( samples_per_gpu):
insert image description here

4. Model training in practice

It is very simple to use MMdetection to train the coco format data set, so how to train on the voc data set defined by yourself? Here I take the ssd model as an example to introduce. First, let me introduce my data set, voc format, there are three categories in total, the folder structure is as follows:

├─TowerVoc
│  └─VOC2012
│      ├─Annotations
│      ├─ImageSets
│      │  └─Main
│      └─JPEGImages

Here we only introduce how to implement it, and you can compare the configuration file I gave here with the original configuration file (the code I gave will also mark the changed places) for which parameters to change.
Open the configuration file corresponding to the ssd and you can see the following:
insert image description here
As you can see, the coco data set is used for training by default. Look at the inheritance relationship of configuration files:
insert image description here
insert image description here
To train a custom voc dataset, you need to create three configuration files:

  • Copy it ssd512_coco.py, name it ssd512_towervoc.py. Among them, tower is the name of my data set, which is taken randomly here.
  • Copy it ssd300_coco.py, name it ssd300_voc.py.
  • Duplicate configs/_base_/datasets/voc0712.pyand name it configs/_base_/datasets/voctower.py.

The three configuration file codes are as follows:
ssd512_towervoc.py

_base_ = 'ssd300_voc.py'         # 改动1
input_size = 512                
model = dict(  
    neck=dict(  
        out_channels=(512, 1024, 512, 256, 256, 256, 256),  
        level_strides=(2, 2, 2, 2, 1),  
        level_paddings=(1, 1, 1, 1, 1),  
        last_kernel_size=4),  
    bbox_head=dict(  
        in_channels=(512, 1024, 512, 256, 256, 256, 256),  
        anchor_generator=dict(  
            type='SSDAnchorGenerator',  
            scale_major=False,  
            input_size=input_size,  
            basesize_ratio_range=(0.1, 0.9),  
            strides=[8, 16, 32, 64, 128, 256, 512],  
            ratios=[[2], [2, 3], [2, 3], [2, 3], [2, 3], [2], [2]])))  
# dataset settings  
dataset_type = 'VOCDataset'      # 改动3
data_root = 'data/TowerVoc/'     # 改动4
img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True)  
train_pipeline = [  
    dict(type='LoadImageFromFile'),  
    dict(type='LoadAnnotations', with_bbox=True),  
    dict(  
        type='Expand',  
        mean=img_norm_cfg['mean'],  
        to_rgb=img_norm_cfg['to_rgb'],  
        ratio_range=(1, 4)),  
    dict(  
        type='MinIoURandomCrop',  
        min_ious=(0.1, 0.3, 0.5, 0.7, 0.9),  
        min_crop_size=0.3),  
    dict(type='Resize', img_scale=(640, 640), keep_ratio=False),  
    dict(type='RandomFlip', flip_ratio=0.5),  
    dict(  
        type='PhotoMetricDistortion',  
        brightness_delta=32,  
        contrast_range=(0.5, 1.5),  
        saturation_range=(0.5, 1.5),  
        hue_delta=18),  
    dict(type='Normalize', **img_norm_cfg),  
    dict(type='DefaultFormatBundle'),  
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),  
]  
test_pipeline = [  
    dict(type='LoadImageFromFile'),  
    dict(  
        type='MultiScaleFlipAug',  
        img_scale=(512, 512),       
        flip=False,  
        transforms=[  
            dict(type='Resize', keep_ratio=False),  
            dict(type='Normalize', **img_norm_cfg),  
            dict(type='ImageToTensor', keys=['img']),  
            dict(type='Collect', keys=['img']),  
        ])  
]  
data = dict(  
    samples_per_gpu=4,             # 如果有需要这里可以改成你自己的batchsize
    workers_per_gpu=2,  
    train=dict(  
        _delete_=True,  
        type='RepeatDataset',  
        times=5,  
        dataset=dict(  
            type=dataset_type,  
            ann_file=data_root + 'VOC2012/ImageSets/Main/train.txt',   # 改动5
            img_prefix=data_root + 'VOC2012/',  
            pipeline=train_pipeline)),  
    val=dict(pipeline=test_pipeline),  
    test=dict(pipeline=test_pipeline))  
# optimizer  
optimizer = dict(type='SGD', lr=2e-3, momentum=0.9, weight_decay=5e-4)  
optimizer_config = dict(_delete_=True)  
custom_hooks = [  
    dict(type='NumClassCheckHook'),  
    dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW')  
]  
  
# evaluation = dict(interval=1, metric='mAP')  
  
# NOTE: `auto_scale_lr` is for automatically scaling LR,  
# USER SHOULD NOT CHANGE ITS VALUES.  
# base_batch_size = (8 GPUs) x (8 samples per GPU)  
auto_scale_lr = dict(base_batch_size=64)

ssd300_voc.py

_base_ = [  
    '../_base_/models/ssd300.py', '../_base_/datasets/voctower.py',    # 改动1
    '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py'  
]  
# model settings  
input_size = 300  
model = dict(  
    type='SingleStageDetector',  
    backbone=dict(  
        type='SSDVGG',  
        depth=16,  
        with_last_pool=False,  
        ceil_mode=True,  
        out_indices=(3, 4),  
        out_feature_indices=(22, 34),  
        init_cfg=dict(  
            type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')),  
    neck=dict(  
        type='SSDNeck',  
        in_channels=(512, 1024),  
        out_channels=(512, 1024, 512, 256, 256, 256),  
        level_strides=(2, 2, 1, 1),  
        level_paddings=(1, 1, 0, 0),  
        l2_norm_scale=20),  
    bbox_head=dict(  
        type='SSDHead',  
        in_channels=(512, 1024, 512, 256, 256, 256),  
        num_classes=3,                                        # 改动2
        anchor_generator=dict(  
            type='SSDAnchorGenerator',  
            scale_major=False,  
            input_size=input_size,  
            basesize_ratio_range=(0.15, 0.9),  
            strides=[8, 16, 32, 64, 100, 300],  
            ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]),  
        bbox_coder=dict(  
            type='DeltaXYWHBBoxCoder',  
            target_means=[.0, .0, .0, .0],  
            target_stds=[0.1, 0.1, 0.2, 0.2])),  
    # model training and testing settings  
    train_cfg=dict(  
        assigner=dict(  
            type='MaxIoUAssigner',  
            pos_iou_thr=0.5,  
            neg_iou_thr=0.5,  
            min_pos_iou=0.,  
            ignore_iof_thr=-1,  
            gt_max_assign_all=False),  
        smoothl1_beta=1.,  
        allowed_border=-1,  
        pos_weight=-1,  
        neg_pos_ratio=3,  
        debug=False),  
    test_cfg=dict(  
        nms_pre=1000,  
        nms=dict(type='nms', iou_threshold=0.45),  
        min_bbox_size=0,  
        score_thr=0.02,  
        max_per_img=200))  
cudnn_benchmark = True  
  
# dataset settings  
dataset_type = 'VOCDataset'                             # 改动3
data_root = 'data/TowerVoc/'  
img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True)  
train_pipeline = [  
    dict(type='LoadImageFromFile'),  
    dict(type='LoadAnnotations', with_bbox=True),  
    dict(  
        type='Expand',  
        mean=img_norm_cfg['mean'],  
        to_rgb=img_norm_cfg['to_rgb'],  
        ratio_range=(1, 4)),  
    dict(  
        type='MinIoURandomCrop',  
        min_ious=(0.1, 0.3, 0.5, 0.7, 0.9),  
        min_crop_size=0.3),  
    dict(type='Resize', img_scale=(300, 300), keep_ratio=False),  
    dict(type='RandomFlip', flip_ratio=0.5),  
    dict(  
        type='PhotoMetricDistortion',  
        brightness_delta=32,  
        contrast_range=(0.5, 1.5),  
        saturation_range=(0.5, 1.5),  
        hue_delta=18),  
    dict(type='Normalize', **img_norm_cfg),  
    dict(type='DefaultFormatBundle'),  
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),  
]  
test_pipeline = [  
    dict(type='LoadImageFromFile'),  
    dict(  
        type='MultiScaleFlipAug',  
        img_scale=(300, 300),  
        flip=False,  
        transforms=[  
            dict(type='Resize', keep_ratio=False),  
            dict(type='Normalize', **img_norm_cfg),  
            dict(type='ImageToTensor', keys=['img']),  
            dict(type='Collect', keys=['img']),  
        ])  
]  
data = dict(  
    samples_per_gpu=8,  
    workers_per_gpu=3,  
    train=dict(  
        _delete_=True,  
        type='RepeatDataset',  
        times=5,  
        dataset=dict(  
            type=dataset_type,  
            ann_file=data_root + 'VOC2012/ImageSets/Main/train.txt',  # 这里其实可以不改
            img_prefix=data_root + 'VOC2012/',                # 因为ssd300_voc.py会重写
            pipeline=train_pipeline)),  
    val=dict(pipeline=test_pipeline),  
    test=dict(pipeline=test_pipeline))  
# optimizer  
optimizer = dict(type='SGD', lr=2e-3, momentum=0.9, weight_decay=5e-4)  
optimizer_config = dict(_delete_=True)  
custom_hooks = [  
    dict(type='NumClassCheckHook'),  
    dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW')  
]  
  
# NOTE: `auto_scale_lr` is for automatically scaling LR,  
# USER SHOULD NOT CHANGE ITS VALUES.  
# base_batch_size = (8 GPUs) x (8 samples per GPU)  
auto_scale_lr = dict(base_batch_size=64)

voctower.py

# dataset settings  
dataset_type = 'VOCDataset'  
data_root = 'data/TowerVoc/'   # 改为自己的数据集文件夹
img_norm_cfg = dict(  
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)  
train_pipeline = [  
    dict(type='LoadImageFromFile'),  
    dict(type='LoadAnnotations', with_bbox=True),  
    dict(type='Resize', img_scale=(640, 640), keep_ratio=True),  
    dict(type='RandomFlip', flip_ratio=0.5),  
    dict(type='Normalize', **img_norm_cfg),  
    dict(type='Pad', size_divisor=32),  
    dict(type='DefaultFormatBundle'),  
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),  
]  
test_pipeline = [  
    dict(type='LoadImageFromFile'),  
    dict(  
        type='MultiScaleFlipAug',  
        img_scale=(640, 640),  
        flip=False,  
        transforms=[  
            dict(type='Resize', keep_ratio=True),  
            dict(type='RandomFlip'),  
            dict(type='Normalize', **img_norm_cfg),  
            dict(type='Pad', size_divisor=32),  
            dict(type='ImageToTensor', keys=['img']),  
            dict(type='Collect', keys=['img']),  
        ])  
]  
data = dict(  
    samples_per_gpu=4,    # 这里改成自己的batch_size 其实对于ssd这个网络来说改不改无所谓
    workers_per_gpu=2,    # 但是有些网络不会重写这个参数,所以为了方便最好还是改一下
    train=dict(  
        type='RepeatDataset',  
        times=3,  
        dataset=dict(  
            type=dataset_type,  
            ann_file=data_root + 'VOC2012/ImageSets/Main/train.txt',  # 修改路径
            img_prefix=data_root + 'VOC2012/',  
            pipeline=train_pipeline)),  
    val=dict(  
        type=dataset_type,  
        ann_file=data_root + 'VOC2012/ImageSets/Main/val.txt',         # 修改路径
        img_prefix=data_root + 'VOC2012/',  
        pipeline=test_pipeline),  
    test=dict(  
        type=dataset_type,  
        ann_file=data_root + 'VOC2012/ImageSets/Main/test.txt',        # 修改路径
        img_prefix=data_root + 'VOC2012/',  
        pipeline=test_pipeline))  
evaluation = dict(interval=1, metric='mAP')

After you have changed it yourself, you can print_config to see if the parameters meet the requirements.

In addition to the above, the following two files need to be modified :

  • anaconda3\envs\conda_env_name\lib\python3.7\site-packages\mmdet\core\evaluation\class_names.py
  • anaconda3\envs\conda_env_name\lib\python3.7\site-packages\mmdet\datasets\voc.py

Change the category to your own:
voc.py
insert image description here
class_names.py
insert image description here

It should be noted here that it is useless to modify the code in mmdet in the project directory. When installing the environment above, we have a step pip install mmdet. The mmdet we use is actually a python library, not the mmdet under the project, so if the data category you want to train is different from the PASCAL VOC dataset, you need to modify the above two files. In fact, the best way is of course to create a new py file for your own data set, but that will be very troublesome.

Code words are not easy, if it is helpful to you, please like it~

Guess you like

Origin blog.csdn.net/Fyw_Fyw_/article/details/129505285