mmdetection modifies the backbone to use the existing models of mmcls and timm

        The official document Tutorial 11: How to xxx — MMDetection 2.24.1 documentation has a more detailed introduction to the replacement of the backbone, especially the newer version already supports the existing classification network in the mmcls library and the timm library, which is generally directly used for modification It can be used, but the most important point here is to ensure that the modified backbone matches the subsequent neck structure, mainly in terms of the number of channels. The general structure of the target detection model is shown in the figure below. If the structure of the neck is not suitable after changing the backbone, the model will fail and an error will be reported. The following uses the yolox model in mmdetection as an example to supplement the method of replacing the backbone in the official document, and finally gives an example of replacing the swin transformer with the yolov3 model.


1. Mmcls backbone model replacement

        Use the command pip install mmcls to install mmcls, and then import mmcls.models to view the supported backbone, as shown in the figure below. It can be seen that the mmcls library basically includes mainstream and classic models, which is also the embodiment of openmmlab's creation of an integrated ecology.     

         Taking the lightweight model of ShuffleNetV2 as an example, this model is replaced by yolox's original cspdarknet backbone network. First look at the model interface of mmcls.models.ShuffleNetV2:

insert image description here

        It should be noted that the out_indices parameter defaults to the convolution output of the fourth stage, and in yolox, the feature maps of 3 stages are input in the original design:

   #============== CSPDarknet ==============
    backbone=dict(type='CSPDarknet', deepen_factor=0.33, widen_factor=0.5),
    neck=dict(
        type='YOLOXPAFPN',
        in_channels=[128, 256, 512],
        out_channels=128,
        num_csp_blocks=1),
    #============== end =================

        According to this idea, we can also output several specific stage feature maps, such as out_indices selection (1, 2, 3), the next step is to determine the output channels of these stages, so as to match the in_channels parameter of the neck, You can use the following code to view the number of output channels of the ShuffleNetV2 model:

from mmcls.models import ShuffleNetV1, ShuffleNetV2, MobileNetV2, MobileNetV3
import torch
# m = MobileNetV3(out_indices=(3, 8, 11))
m = ShuffleNetV2(out_indices=(0,1,2,3))
# m.eval()
inputs = torch.rand(1, 3, 640, 640)
level_outputs = m(inputs)
for level_out in level_outputs:
print(tuple(level_out.shape))
(1, 116, 80, 80)
(1, 232, 40, 40)
(1, 464, 20, 20)
(1, 1024, 20, 20)

        From the above results, it can be seen that the number of output channels corresponding to indices (1, 2, 3) is (232, 464, 1024),
so the settings of the model part in the yolox configuration file are modified to:

# please install mmcls>=0.20.0
# import mmcls.models to trigger register_module in mmcls
custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)
pretrained = 'https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.pth'

# model settings
model = dict(
    type='YOLOX',
    input_size=img_scale,
    random_size_range=(15, 25),
    random_size_interval=10,
     #============== shufflenet v2 =================
    backbone=dict(
        # _delete_=True,
        type='mmcls.ShuffleNetV2',
        out_indices=(1,2,3), # Modify out_indices (116, 232,464,1024)
        init_cfg=dict(
            type='Pretrained',
            checkpoint=pretrained,
            prefix='backbone.')),
    neck=dict(
        type='YOLOXPAFPN',
        in_channels=[232,464,1024],
        out_channels=128,
        num_csp_blocks=1),
    #============== end =================


Note that you need to add custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)         at the beginning of the configuration file
to allow mmdetection to support the mmcls model. pretrained is the pre-trained weight address, which can be found in the official help document ShuffleNet V1 — MMClassification 0.23.0 documentation.

2. TIMM backbone model replacement

        The Timm library also contains a large number of image classification models, which can be called using the mmcls.TIMMBackbone interface. The method of use is roughly the same as the previous steps. You need to import mmcls first, and then set the corresponding model parameters. Taking mobilenetv2 as an example, the model configuration is modified as follows :

# please install mmcls>=0.20.0
# import mmcls.models to trigger register_module in mmcls
custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)
model settings
model = dict(
    type='YOLOX',
    input_size=img_scale,
    random_size_range=(15, 25),
    random_size_interval=10,
    #============== mobilenet v2 (timm) =================
    backbone=dict(
        # _delete_=True,
        type='mmcls.TIMMBackbone',
        model_name='mobilenetv2_100',
        features_only=True,
        pretrained=True,
        out_indices=(2, 3, 4)),
    neck=dict(
        type='YOLOXPAFPN',
        in_channels=[32, 96, 320],
        out_channels=128,
        num_csp_blocks=1),
#============== end =================
)

3. yolov3 replaces the backbone of swin transformer

        Finally, as an example, replace the default cspdarknet53 backbone network in the yolov3 model of mmdetection with the swin transformer, using the mmcls library, and modify the key configuration files as follows:

# please install mmcls>=0.20.0
# import mmcls.models to trigger register_module in mmcls
custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)

# model settings
model = dict(
    type='YOLOV3',
    
    # backbone=dict(
    #     type='Darknet',
    #     depth=53,
    #     out_indices=(3, 4, 5),
    #     init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://darknet53')),
    # neck=dict(
    #     type='YOLOV3Neck',
    #     num_scales=3,
    #     in_channels=[1024, 512, 256],
    #     out_channels=[512, 256, 128]),

    #============== MobileNetV3 ================
    # backbone=dict(
    #     type='mmcls.MobileNetV3',
    #     arch='small',
    #     out_indices=(3, 8, 11),
    #     init_cfg=dict(type='Pretrained', checkpoint='https://download.openmmlab.com/mmclassification/v0/mobilenet_v3/convert/mobilenet_v3_small-8427ecf0.pth',prefix='backbone.')),
    # neck=dict(
    #     type='YOLOV3Neck',
    #     num_scales=3,
    #     in_channels=[96, 48, 24],  # 顺序是反的
    #     out_channels=[512, 256, 128]),
    #==============  end  ===============

    #============== SwinTransformer ================
    backbone=dict(
        type='mmcls.SwinTransformer',
        arch='tiny',
        out_indices=(0,1,2),  # 192,384,768
        init_cfg=dict(type='Pretrained', checkpoint='https://download.openmmlab.com/mmclassification/v0/swin-transformer/swin_tiny_224_b16x64_300e_imagenet_20210616_090925-66df6be6.pth',prefix='backbone.')),
    neck=dict(
        type='YOLOV3Neck',
        num_scales=3,
        in_channels=[768,384,192],  # 顺序是反的
        out_channels=[512, 256, 128]),
    #==============  end  ===============

4. Note that VIT does not support feature_only, which will cause errors when used to extract features


Original link: https://blog.csdn.net/ouening/article/details/124889709

Guess you like

Origin blog.csdn.net/dou3516/article/details/131054046