mmdetection v2.0版本的一些模型使用技巧

最近在学习如何使用mmdetection，收集了一下目前所看到的一些trick和技巧。
参考文章：

1. FP16训练

在mmdetection中，使用FP16非常方便，只需要在configs下的模型文件里添加一行即可。

 _base_ = './faster_rcnn_r50_fpn_1x_coco.py'
# loss_scale你可以自己指定，几百到1000比较合适，这里取512
fp16 = dict(loss_scale=512.)

其实在2.0版本里面已经提供了相应的例子，在configs/fp16里面就有几个示例。

2. Albu数据增强库的使用

Albu库是一个不错的数据增强库，而mmdetection也内部支持了这个库，使用也十分简单，以修改configs/base_/dataset/voc0712.py为例，可以在pipeline里面加入：

 dict(type="Albu", transforms=[
        dict(type="CLAHE", clip_limit=4.0, tile_grid_size=(8, 8), p=0.5),
        dict(type="ChannelShuffle", p=0.5),
        dict(type="RGBShift", r_shift_limit=20, g_shift_limit=20, b_shift_limit=20, p=0.5),
        dict(type="RandomBrightnessContrast", brightness_limit=0.2, contrast_limit=0.2, p=0.5),
        dict(type="HueSaturationValue", hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=0.5),
        dict(type="OneOf", transforms=[
            dict(type="GaussianBlur", blur_limit=(3, 7), sigma_limit=0, p=0.5),
            dict(type="MedianBlur", blur_limit=7, p=0.5),
            dict(type="MotionBlur", p=0.5)], p=0.5)
    ]),

type就是Albu库的数据增强函数，与增强函数名同名，后面跟的是增强函数参数

3. Soft-NMS

soft-NMS改进了之前比较暴力的NMS，当IOU超过某个阈值后，不再直接删除该框，而是降低它的置信度（得分），如果得分低到一个阈值，就会被排除；但是如果降低后任然较高，就会保留。可以在configs/base/models里面的模型文件进行修改，以fast_rcnn_r50_fpn.py为例：

test_cfg=dict(
        rcnn=dict(
            score_thr=0.05,
            # nms=dict(type='nms', iou_threshold=0.5),
            nms=dict(type='soft_nms', iou_threshold=0.5),
            max_per_img=100)))

4. GIoULoss

同理可以在configs/base/models里面的模型文件进行修改

    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        # loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        loss_bbox=dict(type='GIoULoss', loss_weight=5.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', out_size=7, sample_num=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=10,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                # loss_bbox=dict(type='L1Loss', loss_weight=1.0))))
                loss_bbox=dict(type='GIoULoss', loss_weight=5.0))))

5.在线难例挖掘（OHEM）

在线难例挖掘：在训练过程中在线的选择困难样本进行训练（选择loss较大的样本）。设置比较简单，还是在configs/base/models里面的模型文件进行修改。

 rcnn=[
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.4, # 更换
                neg_iou_thr=0.4,
                min_pos_iou=0.4,
                ignore_iof_thr=-1),
            sampler=dict(
                type='OHEMSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                ignore_iof_thr=-1),
            sampler=dict(
                type='OHEMSampler', # 解决难易样本，也解决了正负样本比例问题。
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.6,
                neg_iou_thr=0.6,
                min_pos_iou=0.6,
                ignore_iof_thr=-1),
            sampler=dict(
                type='OHEMSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)
    ],
    stage_loss_weights=[1, 0.5, 0.25])

6. 模型瘦身小技巧

mmdetection在保存模型时，除了保存权重，还保存了原始数据和优化参数。但是，模型在测试时，有些参数是没有用的，怎样去掉这些无用的参数使模型减小（大约减小50%）呢？见下面的代码：

import torch
 
model_path = "epoch_30.pth"
checkpoint = torch.load(model_path)
checkpoint['meta'] = None
checkpoint['optimizer'] = None
 
weights = checkpoint['state_dict']
 
state_dict = {
    
    "state_dict":weights}
 
torch.save(state_dict,  './epotch_30_new.pth')

7. 多尺度训练

只需要修改configs/base_/dataset下的数据读取配置文件中train_pipeline 和test_pipeline中的img_scale部分即可（）换成[(), ()]或者[(), (), ()…]）。一个简单的示例：

rain_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    # dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 
    dict(type='Resize', img_scale=[(1333, 800), (133, 80)], keep_ratio=True, multisclae_mode="range"), 
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]

参数解析：

multisclae_mode可选参数为：[“range”, “value”]，当multisclae_mode="range"时，img_scale的多尺度最多为两个，img_size在这个两个尺度内变化。当multisclae_mode=“value”， img_scale的多尺度可以为任意多个。假设多尺度为[(2000, 1200), (1666, 1000),(1333, 800)]，则代表的含义为：随机从三个尺度中选取一个作为图像的尺寸进行训练。
keep_ratio的可选参数为True或者False时，当为True时则表示按比例获取尺度。假设多尺度为[(2000, 1200), (1333, 800)]，则代表的含义为：首先将图像的短边固定到800到1200范围中的某一个数值，假设为1100，那么对应的长边应该是短边的ratio=1.5倍，则为 1100 ×1.5 = 1650 ，且长边的取值在1333到2000的范围之内。如果大于2000按照2000计算，小于1300按照1300计算。

test_pipeline 中img_scale的尺度可以为任意多个，含义为对测试集进行多尺度测试（可以理解为TTA）。
具体函数可以看mmdet/dataset/pipelines/transforms.py中Resize类。

8. 可变形卷积

在configs下的模型文件里添加一下这行即可。

_base_ = '../faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
model = dict(
    backbone=dict(
        dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False),
        stage_with_dcn=(False, True, True, True)))

当然可变形卷积的设置有很多种，mmdetection已经帮我们实现了很多种类型了，具体的可以参照configs/dcn下各种模型的可变形卷积实现。