Detailed explanation of FCOS official code (two): Architecture(head)

Detailed explanation of FCOS official code (two): Architecture[head]

The head part of the previous article feels too long, so I should write it separately: Detailed explanation of FCOS official code (1): Architecture (backbone)
This article will continue to analyze the fcos_head in the architecture, and I always have this picture in my mind. impression:
The network architecture of FCOS

fcos_head

When the class GeneralizedRCNN is initialized, there is still this sentence:, self.rpn = build_rpn(cfg, self.backbone.out_channels)Actually, it has not been changed here. The actual structure is fcos_head, and the return is build_fcos(cfg, in_channels). The specific code is in fcos_core/modeling/rpn/fcos/fcos.py
and then the return of build_fcos is FCOSModule

def build_fcos(cfg, in_channels):
    return FCOSModule(cfg, in_channels)

Take a look at the initialization part of FCOSModule()

class FCOSModule(torch.nn.Module):
    """
    Module for FCOS computation. Takes feature maps from the backbone and
    FCOS outputs and losses. Only Test on FPN now.
    """

    def __init__(self, cfg, in_channels):
        super(FCOSModule, self).__init__()

        head = FCOSHead(cfg, in_channels)  # 构造fcos的头部

        box_selector_test = make_fcos_postprocessor(cfg)

        loss_evaluator = make_fcos_loss_evaluator(cfg)
        self.head = head
        self.box_selector_test = box_selector_test
        self.loss_evaluator = loss_evaluator
        self.fpn_strides = cfg.MODEL.FCOS.FPN_STRIDES  # eg:[8, 16, 32, 64, 128]

    def forward(self, images, features, targets=None):  # 调用的时候:self.rpn(images, features, targets)
        pass

Then turn around and take a look at FCOSHead:

class FCOSHead(torch.nn.Module):
    def __init__(self, cfg, in_channels):
        """
        Arguments:
            in_channels (int): number of channels of the input feature
            这个就是fpn每层的输出通道数,根据之前分析,都是一样的,如256
        """
        super(FCOSHead, self).__init__()
        # TODO: Implement the sigmoid version first.
        num_classes = cfg.MODEL.FCOS.NUM_CLASSES - 1              # eg:80
        self.fpn_strides = cfg.MODEL.FCOS.FPN_STRIDES             # eg:[8, 16, 32, 64, 128]
        self.norm_reg_targets = cfg.MODEL.FCOS.NORM_REG_TARGETS   # eg:False 直接回归还是归一化后回归
        self.centerness_on_reg = cfg.MODEL.FCOS.CENTERNESS_ON_REG # eg:False centerness和哪个分支共用特征
        self.use_dcn_in_tower = cfg.MODEL.FCOS.USE_DCN_IN_TOWER   # eg:False

        cls_tower = []
        bbox_tower = []
        # eg: cfg.MODEL.FCOS.NUM_CONVS=4头部共享特征时(也称作tower)有4层卷积层
        for i in range(cfg.MODEL.FCOS.NUM_CONVS):
            if self.use_dcn_in_tower and \
                    i == cfg.MODEL.FCOS.NUM_CONVS - 1:
                conv_func = DFConv2d
            else:
                conv_func = nn.Conv2d

            # cls_tower和bbox_tower都是4层的256通道的3×3的卷积层,后加一些GN和Relu
            cls_tower.append(
                conv_func(
                    in_channels,
                    in_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    bias=True
                )
            )
            cls_tower.append(nn.GroupNorm(32, in_channels))
            cls_tower.append(nn.ReLU())
            bbox_tower.append(
                conv_func(
                    in_channels,
                    in_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    bias=True
                )
            )
            bbox_tower.append(nn.GroupNorm(32, in_channels))
            bbox_tower.append(nn.ReLU())

        self.add_module('cls_tower', nn.Sequential(*cls_tower))
        self.add_module('bbox_tower', nn.Sequential(*bbox_tower))
        # cls_logits就是网络的直接分类输出结果,shape:[H×W×C]
        self.cls_logits = nn.Conv2d(
            in_channels, num_classes, kernel_size=3, stride=1,
            padding=1
        )
        # bbox_pred就是网络的回归分支输出结果,shape:[H×W×4]
        self.bbox_pred = nn.Conv2d(
            in_channels, 4, kernel_size=3, stride=1,
            padding=1
        )
        # centerness就是网络抑制低质量框的分支,shape:[H×W×1]
        self.centerness = nn.Conv2d(
            in_channels, 1, kernel_size=3, stride=1,
            padding=1
        )

        # initialization 这些层里面的卷积参数都进行初始化
        for modules in [self.cls_tower, self.bbox_tower,
                        self.cls_logits, self.bbox_pred,
                        self.centerness]:
            for l in modules.modules():
                if isinstance(l, nn.Conv2d):
                    torch.nn.init.normal_(l.weight, std=0.01)
                    torch.nn.init.constant_(l.bias, 0)

        # initialize the bias for focal loss 我只知道分类是用focal loss,可能是一种经验trick?
        prior_prob = cfg.MODEL.FCOS.PRIOR_PROB
        bias_value = -math.log((1 - prior_prob) / prior_prob)
        torch.nn.init.constant_(self.cls_logits.bias, bias_value)

        # P3-P7共有5层特征FPN,缩放因子,对回归结果进行缩放
        self.scales = nn.ModuleList([Scale(init_value=1.0) for _ in range(5)])  

    def forward(self, x):
        logits = []
        bbox_reg = []
        centerness = []
        # 我想这里的x应该是fpn出来的各层特征,因为x根据下一句看是可迭代的
        for l, feature in enumerate(x):
            # 要注意,不图层经过tower之后的特征图大小是不一样的
            # 还有一点就是,不同层的特征都是共享一个tower,无论是cls分支还是bbox分支
            cls_tower = self.cls_tower(feature)
            box_tower = self.bbox_tower(feature)

            logits.append(self.cls_logits(cls_tower))
            # 根据centerness_on_reg选择对应的tower特征
            if self.centerness_on_reg:
                centerness.append(self.centerness(box_tower))
            else:
                centerness.append(self.centerness(cls_tower))

            bbox_pred = self.scales[l](self.bbox_pred(box_tower))  # 得到缩放后的bbox_pred
            if self.norm_reg_targets:
                bbox_pred = F.relu(bbox_pred)
                if self.training:
                    bbox_reg.append(bbox_pred)
                else:
                    bbox_reg.append(bbox_pred * self.fpn_strides[l])
            else:
                bbox_reg.append(torch.exp(bbox_pred))
        return logits, bbox_reg, centerness
  1. Regarding why there is an exponent e operation in the regression branch, the original paper said:

Moreover, since the regression targets are always positive, we employ exp(x) to map any real number to (0, + ∞ +\infty +) on the top of the regression branch

  1. Regarding the scaling of bbox_pred in the above code, there is only one piece in the original paper that says: It
    Insert picture description here
    can be seen that in order to continue to share the head at different levels of features, here the regression prediction result is multiplied by a scaling factor, this factor is tensor, it is OK Update, that can be learned, of course, the classification branch does not need.
    Here is the head part I printed out:
(rpn): FCOSModule(
    (head): FCOSHead(
      (cls_tower): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): GroupNorm(32, 256, eps=1e-05, affine=True)
        (2): ReLU()
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): GroupNorm(32, 256, eps=1e-05, affine=True)
        (5): ReLU()
        (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (7): GroupNorm(32, 256, eps=1e-05, affine=True)
        (8): ReLU()
        (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (10): GroupNorm(32, 256, eps=1e-05, affine=True)
        (11): ReLU()
      )
      (bbox_tower): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): GroupNorm(32, 256, eps=1e-05, affine=True)
        (2): ReLU()
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): GroupNorm(32, 256, eps=1e-05, affine=True)
        (5): ReLU()
        (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (7): GroupNorm(32, 256, eps=1e-05, affine=True)
        (8): ReLU()
        (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (10): GroupNorm(32, 256, eps=1e-05, affine=True)
        (11): ReLU()
      )
      (cls_logits): Conv2d(256, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bbox_pred): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (centerness): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (scales): ModuleList(
        (0): Scale()
        (1): Scale()
        (2): Scale()
        (3): Scale()
        (4): Scale()
      )
    )
    (box_selector_test): FCOSPostProcessor()
  )

At this point, the entire FCOS network structure is clear! About the forward propagation code of FCOSModule , you can put the training part together!

Links to the next article

Previous: Detailed explanation of FCOS official code (1): Architecture (backbone)
Next : Wait until it is written

Guess you like

Origin blog.csdn.net/laizi_laizi/article/details/105519290