Interpretation of model framework files and parameters in MMDetection-MMDetection (1)

This article mainly introduces the model file --model.py in the configuration file included in the complete MMDetection

The following code explains the meaning and use of the code contained in the model file in detail line by line

1. Feature extraction network backbone

 Use the SwinTransformer shown in the figure above as the backbone of the feature extraction network, and perform the following configurations.
 If you want to select other backbone models, change "type='new model name' ", and you need to define new parameters according to the new model you choose, such as the following example of SwinTransformer as the backbone, and the meaning of the corresponding parameters

    backbone=dict(
        type='SwinTransformer',     #主干网络(特征提取网络)采用Swin Transformer,以下为关于Swin Transformer网络参数的选取
        embed_dim=96,               #输入Swin Transformer第一层的嵌入维度,整个过程为[96, 192, 384, 768]
        depths=[2, 2, 6, 2],        #Swin Transformer四个阶段W-MSA和SW-MSA的个数
        num_heads=[3, 6, 12, 24],   #Swin Transformer四个阶段分别对应的多头数
        window_size=7,              #Swin Transformer整个阶段采用的窗口大小
        mlp_ratio=4.,               #MPL的词向量嵌入维度,默认为4
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.2,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),

2. Feature enhanced neck network 

  Use the Feature Pyramid Network (FPN) feature pyramid shown in the figure above as the backbone of the feature extraction network, and perform the following configurations.
 Similarly, if you want to select other necks, change "type='new model name'", and you need to define new parameters according to the new model you choose, such as the following example of FPN as the backbone, and the meaning of the corresponding parameters, in_channels Corresponding to the number of scale channels in the four stages of swin transformer, out_channels corresponds to the number of output channels of each scale, num_outs is the number of output scales, the output scale in RCNN will be one more scale than the feature extraction network, so num_outs=backbone scale number+1

    neck=dict(
        type='FPN',                     #neck特征增强采用FPN(特征金字塔)
        in_channels=[96, 192, 384, 768],#每个尺度的输入通道数
        out_channels=256,               #每个尺度的输出通道数,每个尺度都一致
        num_outs=5),                    #输出尺度的数量,RCNN中的输出尺度数量为4+1,其中4为主干网络Swin Transformer的四个尺度

 3. The second stage of the detector

One stage: Extract proposal candidate box - RPN

The full name of rpn is region proposal network, which is used to provide high-quality target candidate frames for the second stage
①anchor generator

generates different anchors according to given scales, ratios, and strides.
②anchor target generator
The anchor target layer completes the task of distinguishing which anchors are positive samples (including real targets) and which anchors are negative samples (including only the background). The specific method is to calculate the IoU of the anchor and ground truth.
③RPN Loss
rpn has two tasks: from many anchors, judge which anchors are positive samples and which ones are negative samples, that is, the classification task; for the anchor of the positive sample, regression obtains the real target, that is, the regression task. So the loss consists of two parts, the time function of the classification branch and the loss function of the regression branch.
④The purpose of the proposal generator
to obtain the candidate frame is to provide a high-quality ROI frame for the second stage

rpn_head=dict(
        type='RPNHead',                 #采用RPN提取proposal候选框
        in_channels=256,                #输入特征图中的通道数
        feat_channels=256,              #特征图的通道数
        anchor_generator=dict(
            type='AnchorGenerator',     #anchor生成器的配置
            scales=[8],                 #就是放缩的尺度,要将宽和高按照各个scale的值放大。scales*strides
            ratios=[0.5, 1.0, 2.0],     #长短边比例
            strides=[4, 8, 16, 32, 64]),#anchor_strides代表感受野的大小,以配置文件中,anchor_strides=[4, 8, 16, 32, 64]为例。
                                        #具体来说,anchor_strides[0]就是指P2层feature map上的一个点对应到原图上的4个像素;
                                        # anchor_strides[1]就是指P3层feature map上的一个点对应到原图上的8个像素;
                                        # 以此类推……正是由于这个特性,anchor_strides也刚好代表了每一个levlel的anchor对应于原图中的基础大小。
                                        # 因此,在AnchorGenerator中,传入的anchor_strides参数被命名为base_size。
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder', #训练过程中对anchor框进行编码和解码,采用'DeltaXYWHBBoxCoder' 的框编码器
            target_means=[.0, .0, .0, .0],      #用于编码和解码框的目标均值
            target_stds=[1.0, 1.0, 1.0, 1.0]),  #用于编码和解码框的标准方差
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),#分类分支的损失函数配置,采用CrossEntropyLoss损失,在RPN中通常用于二分类吗,仅分类目标和背景,因此基本上使用sigmoid函数
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),                #回归分支的损失函数配置,采用L1Loss损失,给损失权重赋值

The second stage: feature extraction of the candidate area, and send it to the classifier for category discrimination——ROI

The full name of ROI is Region of Interest, which refers to the "region of interest" of a picture.
①The proposal target generator
selects a certain amount based on the proposal generated by rpn (min_batch: generally select 256 proposals for each picture, or 512 proposals )’s roi, as the sample in the second stage of training, and set the ratio of positive and negative samples in the min_batch
②feature crop and pooling
On the ROI obtained in the first step, according to the size of the ROI, it is necessary to select an appropriate feature layer to crop and Pooling obtains a fixed-size feature map. This process is called ROI Align (ROI alignment). Bilinear interpolation is used to obtain pixel values ​​and then ROI Align is realized.

 roi_head=dict(                                                      #作为检测器的第二步
        type='StandardRoIHead',
        bbox_roi_extractor=dict(                                        #Bbox感兴趣区域的特征提取器
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),#roi层的配置,采用的ROI对齐,特征图输出大小为7,sampling_ratio指提取ROI特征时的采样率
            out_channels=256,                                                #特征的输出通道数
            featmap_strides=[4, 8, 16, 32]),                                 #多尺度特征图的步幅,应该与主干的架构保持一致,具体可参考RPN中的strides前四个
        bbox_head=dict(                                                 #RoIHead 中 box head 的配置
            type='Shared2FCBBoxHead',
            in_channels=256,                                            #bbox_head的输入通道,与bbox_roi_extractor的输出通道大小一致
            fc_out_channels=1024,                                       #全连接FC层的输出特征通道数
            roi_feat_size=7,                                            #候选区域(Region of Interest)特征的大小
            num_classes=80,                                             #分类的类别数,和你自己的数据集相匹配
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0., 0., 0., 0.],                          #用于编码和解码框的目标均值
                target_stds=[0.1, 0.1, 0.2, 0.2]),                      #编码和解码的标准方差。因为框更准确,所以值更小,常规设置时 [0.1, 0.1, 0.2, 0.2]。
            reg_class_agnostic=False,                                   #回归是否与类别无关
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),#分类分支的损失函数配置,采用CrossEntropyLoss损失,是否使用sigmoid函数
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),                 #回归分支的损失函数配置,采用L1Loss损失,给损失权重赋值

4. Hyperparameter configuration for model training and testing

Some configurations of train_cfg and test_cfg are related to the hyperparameter configuration of the sample part of RPN and RCNN. The specific parameters are explained as follows:

 train_cfg=dict(                                         #rpn 和 rcnn 训练超参数的配置
        rpn=dict(
            assigner=dict(                                  #分配正负样本分配器的配置
                type='MaxIoUAssigner',                      #选用MaxIoUAssigner
                pos_iou_thr=0.7,                            #IOU >= 0.7 作为正样本
                neg_iou_thr=0.3,                            #IOU <= 0.3 作为负样本
                min_pos_iou=0.3,                            #作为正样本的最小IOU阈值
                match_low_quality=True,                     #是否匹配低质量的框
                ignore_iof_thr=-1),                         #忽略 bbox 的 IoF 阈值
            sampler=dict(                                   #正负采样器的配置
                type='RandomSampler',                       #选用RandomSampler采样器
                num=256,                                    #需要提取样本的数量
                pos_fraction=0.5,                           #正样本占总样本数的比例
                neg_pos_ub=-1,                              #基于正样本数量的负样本上限,超出上限的忽略,-1表示不忽略
                add_gt_as_proposals=False),                 #采样后是否添加 GT 作为 proposal
            allowed_border=-1,                              #对有效anchor进行边界填充,-1表示不填充
            pos_weight=-1,                                  #训练期间正样本权重,-1代表不更改
            debug=False),                                   #是否设置调试(debug)模式
        rpn_proposal=dict(                                  #在训练期间生成 proposals 的配置
            nms_pre=2000,                                   #做非极大值抑制(NMS)前box的数量
            max_per_img=1000,                               #做NMS后要保留的box的数量
            nms=dict(type='nms', iou_threshold=0.7),        #NMS,其阈值为0.7
            min_bbox_size=0),                               #允许的最小 box 尺寸
        rcnn=dict(
            assigner=dict(                                  #RCNN分配正负样本
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,                            #IOU >= 0.5 作为正样本
                neg_iou_thr=0.5,                            #IOU < 0.5 作为负样本
                min_pos_iou=0.5,                            # 将 box 作为正样本的最小 IoU 阈值
                match_low_quality=True,                     #是否匹配低质量的框
                ignore_iof_thr=-1),                         #忽略 bbox 的 IoF 阈值,-1表示不忽略
            sampler=dict(                                   #正负采样器的配置
                type='RandomSampler',                       #选用RandomSampler采样器
                num=512,                                    #需要提取样本的数量
                pos_fraction=0.25,                          #正样本占总样本数的比例
                neg_pos_ub=-1,                              #基于正样本数量的负样本上限,超出上限的忽略,-1表示不忽略
                add_gt_as_proposals=True),                  #采样后是否添加 GT 作为 proposal
            mask_size=28,                                   #mask的大小
            pos_weight=-1,                                  #训练期间正样本权重,-1代表不更改
            debug=False)),                                  #是否设置调试(debug)模式
    test_cfg=dict(                                          #rpn 和 rcnn 测试超参数的配置
        rpn=dict(
            nms_pre=1000,                                   #做非极大值抑制(NMS)前box的数量
            max_per_img=1000,                               #做NMS后要保留的box的数量
            nms=dict(type='nms', iou_threshold=0.7),        #NMS,其阈值为0.7
            min_bbox_size=0),                               #允许的最小 box 尺寸
        rcnn=dict(
            score_thr=0.05,                                 #bbox的分数阈值
            nms=dict(type='nms', iou_threshold=0.5),        #NMS,其阈值为0.5
            max_per_img=100,                                #做NMS后要保留的box的数量
            mask_thr_binary=0.5)))                          #mask预处的阈值

The following is a complete model file code and parameter introduction mask_rcnn_swin_fpn.py

# model settings
model = dict(
    type='MaskRCNN',                #你所采用的检测器类型
    pretrained=None,
    backbone=dict(
        type='SwinTransformer',     #主干网络(特征提取网络)采用Swin Transformer,以下为关于Swin Transformer网络参数的选取
        embed_dim=96,               #输入Swin Transformer第一层的嵌入维度,整个过程为[96, 192, 384, 768]
        depths=[2, 2, 6, 2],        #Swin Transformer四个阶段W-MSA和SW-MSA的个数
        num_heads=[3, 6, 12, 24],   #Swin Transformer四个阶段分别对应的多头数
        window_size=7,              #Swin Transformer整个阶段采用的窗口大小
        mlp_ratio=4.,               #MPL的词向量嵌入维度,默认为4
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.2,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='FPN',                     #neck特征增强采用FPN(特征金字塔)
        in_channels=[96, 192, 384, 768],#每个尺度的输入通道数
        out_channels=256,               #每个尺度的输出通道数,每个尺度都一致
        num_outs=5),                    #输出尺度的数量,RCNN中的输出尺度数量为4+1,其中4为主干网络Swin Transformer的四个尺度
    rpn_head=dict(
        type='RPNHead',                 #采用RPN提取proposal候选框
        in_channels=256,                #输入特征图中的通道数
        feat_channels=256,              #特征图的通道数
        anchor_generator=dict(
            type='AnchorGenerator',     #anchor生成器的配置
            scales=[8],                 #就是放缩的尺度,要将宽和高按照各个scale的值放大。scales*strides
            ratios=[0.5, 1.0, 2.0],     #长短边比例
            strides=[4, 8, 16, 32, 64]),#anchor_strides代表感受野的大小,以配置文件中,anchor_strides=[4, 8, 16, 32, 64]为例。
                                        #具体来说,anchor_strides[0]就是指P2层feature map上的一个点对应到原图上的4个像素;
                                        # anchor_strides[1]就是指P3层feature map上的一个点对应到原图上的8个像素;
                                        # 以此类推……正是由于这个特性,anchor_strides也刚好代表了每一个levlel的anchor对应于原图中的基础大小。
                                        # 因此,在AnchorGenerator中,传入的anchor_strides参数被命名为base_size。
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder', #训练过程中对anchor框进行编码和解码,采用'DeltaXYWHBBoxCoder' 的框编码器
            target_means=[.0, .0, .0, .0],      #用于编码和解码框的目标均值
            target_stds=[1.0, 1.0, 1.0, 1.0]),  #用于编码和解码框的标准方差
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),#分类分支的损失函数配置,采用CrossEntropyLoss损失,在RPN中通常用于二分类吗,仅分类目标和背景,因此基本上使用sigmoid函数
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),                #回归分支的损失函数配置,采用L1Loss损失,给损失权重赋值
    roi_head=dict(                                                      #作为检测器的第二步
        type='StandardRoIHead',
        bbox_roi_extractor=dict(                                        #Bbox感兴趣区域的特征提取器
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),#roi层的配置,采用的ROI对齐,特征图输出大小为7,sampling_ratio指提取ROI特征时的采样率
            out_channels=256,                                                #特征的输出通道数
            featmap_strides=[4, 8, 16, 32]),                                 #多尺度特征图的步幅,应该与主干的架构保持一致,具体可参考RPN中的strides前四个
        bbox_head=dict(                                                 #RoIHead 中 box head 的配置
            type='Shared2FCBBoxHead',
            in_channels=256,                                            #bbox_head的输入通道,与bbox_roi_extractor的输出通道大小一致
            fc_out_channels=1024,                                       #全连接FC层的输出特征通道数
            roi_feat_size=7,                                            #候选区域(Region of Interest)特征的大小
            num_classes=80,                                             #分类的类别数,和你自己的数据集相匹配
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0., 0., 0., 0.],                          #用于编码和解码框的目标均值
                target_stds=[0.1, 0.1, 0.2, 0.2]),                      #编码和解码的标准方差。因为框更准确,所以值更小,常规设置时 [0.1, 0.1, 0.2, 0.2]。
            reg_class_agnostic=False,                                   #回归是否与类别无关
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),#分类分支的损失函数配置,采用CrossEntropyLoss损失,是否使用sigmoid函数
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),                 #回归分支的损失函数配置,采用L1Loss损失,给损失权重赋值
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=80,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    # model training and testing settings
    train_cfg=dict(                                         #rpn 和 rcnn 训练超参数的配置
        rpn=dict(
            assigner=dict(                                  #分配正负样本分配器的配置
                type='MaxIoUAssigner',                      #选用MaxIoUAssigner
                pos_iou_thr=0.7,                            #IOU >= 0.7 作为正样本
                neg_iou_thr=0.3,                            #IOU <= 0.3 作为负样本
                min_pos_iou=0.3,                            #作为正样本的最小IOU阈值
                match_low_quality=True,                     #是否匹配低质量的框
                ignore_iof_thr=-1),                         #忽略 bbox 的 IoF 阈值
            sampler=dict(                                   #正负采样器的配置
                type='RandomSampler',                       #选用RandomSampler采样器
                num=256,                                    #需要提取样本的数量
                pos_fraction=0.5,                           #正样本占总样本数的比例
                neg_pos_ub=-1,                              #基于正样本数量的负样本上限,超出上限的忽略,-1表示不忽略
                add_gt_as_proposals=False),                 #采样后是否添加 GT 作为 proposal
            allowed_border=-1,                              #对有效anchor进行边界填充,-1表示不填充
            pos_weight=-1,                                  #训练期间正样本权重,-1代表不更改
            debug=False),                                   #是否设置调试(debug)模式
        rpn_proposal=dict(                                  #在训练期间生成 proposals 的配置
            nms_pre=2000,                                   #做非极大值抑制(NMS)前box的数量
            max_per_img=1000,                               #做NMS后要保留的box的数量
            nms=dict(type='nms', iou_threshold=0.7),        #NMS,其阈值为0.7
            min_bbox_size=0),                               #允许的最小 box 尺寸
        rcnn=dict(
            assigner=dict(                                  #RCNN分配正负样本
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,                            #IOU >= 0.5 作为正样本
                neg_iou_thr=0.5,                            #IOU < 0.5 作为负样本
                min_pos_iou=0.5,                            # 将 box 作为正样本的最小 IoU 阈值
                match_low_quality=True,                     #是否匹配低质量的框
                ignore_iof_thr=-1),                         #忽略 bbox 的 IoF 阈值,-1表示不忽略
            sampler=dict(                                   #正负采样器的配置
                type='RandomSampler',                       #选用RandomSampler采样器
                num=512,                                    #需要提取样本的数量
                pos_fraction=0.25,                          #正样本占总样本数的比例
                neg_pos_ub=-1,                              #基于正样本数量的负样本上限,超出上限的忽略,-1表示不忽略
                add_gt_as_proposals=True),                  #采样后是否添加 GT 作为 proposal
            mask_size=28,                                   #mask的大小
            pos_weight=-1,                                  #训练期间正样本权重,-1代表不更改
            debug=False)),                                  #是否设置调试(debug)模式
    test_cfg=dict(                                          #rpn 和 rcnn 测试超参数的配置
        rpn=dict(
            nms_pre=1000,                                   #做非极大值抑制(NMS)前box的数量
            max_per_img=1000,                               #做NMS后要保留的box的数量
            nms=dict(type='nms', iou_threshold=0.7),        #NMS,其阈值为0.7
            min_bbox_size=0),                               #允许的最小 box 尺寸
        rcnn=dict(
            score_thr=0.05,                                 #bbox的分数阈值
            nms=dict(type='nms', iou_threshold=0.5),        #NMS,其阈值为0.5
            max_per_img=100,                                #做NMS后要保留的box的数量
            mask_thr_binary=0.5)))                          #mask预处的阈值
Play with MMDetection-MMDetection's data set files, training plan files, running information files and interpretation of specific parameters (2) 
Play with MMDetection-MMDetection to make your own configuration files (3)

Guess you like

Origin blog.csdn.net/weixin_42715977/article/details/130003440