Target detection Neck: FPN (Feature Pyramid Network) and PAN (with torch code)

Article directory

0. Preface
1. FPN
- 1.1 FPN core ideas and steps
- 1.2 Fusion process of FPN
2. PAN
Reference

Both FPN and PAN are methods used to solve the shortcomings of Feature Pyramid Network (FPN) in multi-scale detection tasks in target detection. Their principles and differences are introduced in detail below.

0. Preface

The structure of the target detector
Insert image description here

Input：Image，Patches，ImagePyramid
Backbones：VGG16，ResNet（ResNet-18、ResNet-34、ResNet-50、ResNet-101、ResNet-152），SpineNet，EfficientNet-B0/B7，CSPResNeXt50，CSPDarknet53，MobileNet（v1、v2、v3），ShuffleNet（v1、v2），GhostNet
Neck：
Additional blocks：SPP，ASPP，RFB，SAM
Path-aggregation blocks：FPN，PAN，NAS-FPN，Fully-connectedFPN，BiFPN，ASFF，SFAM
Heads：
Dense Prediction（one-stage）：
RPN，SSD，YOLO，RetinaNet（anchorbased）
CornerNet，CenterNet，MatrixNet，FCOS（FCOSv1、FCOSv2），ATSS，PAA（anchorfree）
SparsePrediction（two-stage）：
FasterR-CNN，R-FCN，MaskR-CNN（anchorbased）
RepPoints（anchorfree）

The design of the Neck part is diverse

Insert image description here
（a） FPN
（b） PANet
（c） NAS-FPN
（d） BiFPN

1. FPN

FPN, the full name of Feature Pyramid Network, is a method proposed by FAIR in 2017 to deal with multi-scale problems. The main idea of FPN is to extract target features at different scales by constructing pyramid feature maps, thereby improving detection accuracy.

Insert image description here

The way FPN is constructed is to downsample starting from high-resolution feature maps, while upsampling from low-resolution feature maps, and connect them to form a pyramid. In this process, the information of each layer of feature maps will be merged with the feature maps of the upper and lower adjacent layers, so that the target information in the high-level feature maps can be retained, and at the same time, the background information in the low-level feature maps can also be absorbed by the high-level feature maps. supplemented. After such processing, FPN can improve the accuracy of the model on multi-scale detection tasks, while also improving the detection speed without affecting the detection speed.

import collections
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import init

from core.config import cfg
import utils.net as net_utils
import modeling.ResNet as ResNet
from modeling.generate_anchors import generate_anchors
from modeling.generate_proposals import GenerateProposalsOp
from modeling.collect_and_distribute_fpn_rpn_proposals import CollectAndDistributeFpnRpnProposalsOp
import nn as mynn

# Lowest and highest pyramid levels in the backbone network. For FPN, we assume
# that all networks have 5 spatial reductions, each by a factor of 2. Level 1
# would correspond to the input image, hence it does not make sense to use it.
LOWEST_BACKBONE_LVL = 2   # E.g., "conv2"-like level
HIGHEST_BACKBONE_LVL = 5  # E.g., "conv5"-like level


# ---------------------------------------------------------------------------- #
# FPN with ResNet
# ---------------------------------------------------------------------------- #

def fpn_ResNet50_conv5_body():
    return fpn(
        ResNet.ResNet50_conv5_body, fpn_level_info_ResNet50_conv5()
    )

def fpn_ResNet50_conv5_body_bup():
    return fpn(
        ResNet.ResNet50_conv5_body, fpn_level_info_ResNet50_conv5(),
        panet_buttomup=True
    )


def fpn_ResNet50_conv5_P2only_body():
    return fpn(
        ResNet.ResNet50_conv5_body,
        fpn_level_info_ResNet50_conv5(),
        P2only=True
    )


def fpn_ResNet101_conv5_body():
    return fpn(
        ResNet.ResNet101_conv5_body, fpn_level_info_ResNet101_conv5()
    )


def fpn_ResNet101_conv5_P2only_body():
    return fpn(
        ResNet.ResNet101_conv5_body,
        fpn_level_info_ResNet101_conv5(),
        P2only=True
    )


def fpn_ResNet152_conv5_body():
    return fpn(
        ResNet.ResNet152_conv5_body, fpn_level_info_ResNet152_conv5()
    )


def fpn_ResNet152_conv5_P2only_body():
    return fpn(
        ResNet.ResNet152_conv5_body,
        fpn_level_info_ResNet152_conv5(),
        P2only=True
    )


# ---------------------------------------------------------------------------- #
# Functions for bolting FPN onto a backbone architectures
# ---------------------------------------------------------------------------- #
class fpn(nn.Module):
    """Add FPN connections based on the model described in the FPN paper.

    fpn_output_blobs is in reversed order: e.g [fpn5, fpn4, fpn3, fpn2]
    similarly for fpn_level_info.dims: e.g [2048, 1024, 512, 256]
    similarly for spatial_scale: e.g [1/32, 1/16, 1/8, 1/4]
    """
    def __init__(self, conv_body_func, fpn_level_info, P2only=False, panet_buttomup=False):
        super().__init__()
        self.fpn_level_info = fpn_level_info
        self.P2only = P2only
        self.panet_buttomup = panet_buttomup

        self.dim_out = fpn_dim = cfg.FPN.DIM
        min_level, max_level = get_min_max_levels()
        self.num_backbone_stages = len(fpn_level_info.blobs) - (min_level - LOWEST_BACKBONE_LVL)
        fpn_dim_lateral = fpn_level_info.dims
        self.spatial_scale = []  # a list of scales for FPN outputs

        #
        # Step 1: recursively build down starting from the coarsest backbone level
        #
        # For the coarest backbone level: 1x1 conv only seeds recursion
        self.conv_top = nn.Conv2d(fpn_dim_lateral[0], fpn_dim, 1, 1, 0)
        if cfg.FPN.USE_GN:
            self.conv_top = nn.Sequential(
                nn.Conv2d(fpn_dim_lateral[0], fpn_dim, 1, 1, 0, bias=False),
                nn.GroupNorm(net_utils.get_group_gn(fpn_dim), fpn_dim,
                             eps=cfg.GROUP_NORM.EPSILON)
            )
        else:
            self.conv_top = nn.Conv2d(fpn_dim_lateral[0], fpn_dim, 1, 1, 0)
        self.topdown_lateral_modules = nn.ModuleList()
        self.posthoc_modules = nn.ModuleList()

        # For other levels add top-down and lateral connections
        for i in range(self.num_backbone_stages - 1):
            self.topdown_lateral_modules.append(
                topdown_lateral_module(fpn_dim, fpn_dim_lateral[i+1])
            )

        # Post-hoc scale-specific 3x3 convs
        for i in range(self.num_backbone_stages):
            if cfg.FPN.USE_GN:
                self.posthoc_modules.append(nn.Sequential(
                    nn.Conv2d(fpn_dim, fpn_dim, 3, 1, 1, bias=False),
                    nn.GroupNorm(net_utils.get_group_gn(fpn_dim), fpn_dim,
                                 eps=cfg.GROUP_NORM.EPSILON)
                ))
            else:
                self.posthoc_modules.append(
                    nn.Conv2d(fpn_dim, fpn_dim, 3, 1, 1)
                )

            self.spatial_scale.append(fpn_level_info.spatial_scales[i])

        # add for panet buttom-up path
        if self.panet_buttomup:
            self.panet_buttomup_conv1_modules = nn.ModuleList()
            self.panet_buttomup_conv2_modules = nn.ModuleList()
            for i in range(self.num_backbone_stages - 1):
                if cfg.FPN.USE_GN:
                    self.panet_buttomup_conv1_modules.append(nn.Sequential(
                        nn.Conv2d(fpn_dim, fpn_dim, 3, 2, 1, bias=True),
                        nn.GroupNorm(net_utils.get_group_gn(fpn_dim), fpn_dim,
                                    eps=cfg.GROUP_NORM.EPSILON),
                        nn.ReLU(inplace=True)
                    ))
                    self.panet_buttomup_conv2_modules.append(nn.Sequential(
                        nn.Conv2d(fpn_dim, fpn_dim, 3, 1, 1, bias=True),
                        nn.GroupNorm(net_utils.get_group_gn(fpn_dim), fpn_dim,
                                    eps=cfg.GROUP_NORM.EPSILON),
                        nn.ReLU(inplace=True)
                    ))
                else:
                    self.panet_buttomup_conv1_modules.append(
                        nn.Conv2d(fpn_dim, fpn_dim, 3, 2, 1)
                    )
                    self.panet_buttomup_conv2_modules.append(
                        nn.Conv2d(fpn_dim, fpn_dim, 3, 1, 1)