Neck end design of YOLOv5

Neck end design of YOLOv5

In the last article "Backbone Design of YOLOv5" , we started from the backbone configuration file of yolov5, explained the network architecture of the backbone and the source code and structure of each module in detail, and had a more comprehensive preliminary understanding of the skeleton network. Next, we will follow the previous learning ideas and continue to go deep into the source code of the network structure to explore the design of the Neck side of YOLO.

1 Neck structure overview

insert image description here

The neck and head are not distinguished in the network structure configuration file, but directly named after the head, which is also convenient for loading in models/yolo.py. In order for readers to clearly perceive the design of the neck, in this article we only discuss the neck part of the head:

neck:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)
  ]

It can be seen that the components of the Neck part are relatively simple compared with the Backbone, basically consisting of CBS, Upsample, Concat and CSP without shortcut (C3).
FPN and PAN
insert image description here

In addition, Neck's network structure design also follows the structure of FPN+PAN . FPN uses a top-down side connection to construct high-level semantic feature maps at all scales, and constructs the classic structure of the feature pyramid, which is a in the above figure. For details, please refer to my previous article: FPN Network structure + source code explanation ; the structure of PAN is not uncommon. After the multi-layer network in the middle of FPN, the underlying target information is already very vague, so PAN has added a bottom-up route to make up for and strengthen the positioning information. It is b in the figure above. Post the original text of FPN and PAN:
FPN: Feature Pyramid Networks for Object Detection
PAN: Path Aggregation Network for Instance Segmentation

2 Neck part modules

2.1 CBS

In Backbone, in order to further extract the information in the image, while changing the channel of the feature map, CBS also controls the step size s downsampling in the convolution module to change the size of the feature map. In the process of using FPN top-down design on the left side of Neck, it is the process of sampling the feature map, so it is not appropriate to down-sample at this time, so s=1 in FPN; and on the right side of PAN, it is bottom-down again When extracting location information, it is necessary to use CBS to continue downsampling to extract high-level semantic information, which is also the reason for the difference in parameters before and after CBS. For the specific CBS structure, please refer to the previous article: Detailed explanation of Backbone of YOLOv5 , which will not be repeated here.

2.2 nn.Upsample

[-1, 1, nn.Upsample, [None, 2, 'nearest']]

Pytroch's built-in upsampling module needs to specify the multiple and method of upsampling.

insert image description here
Here we do not specify the size, the upsampling multiple is 2, and the upsampling method is nearest, that is, the nearest filling.

2.3 Concat

[[-1, 6], 1, Concat, [1]],  # cat backbone P4

Concat is splicing. The concatenated object is passed in from, and the concatenated dimension is specified by the args parameter. Here, it is concatenated according to dimension 1 (channel), and other dimensions remain unchanged. As for which feature map in from is, I suggest that you run the model to print out the feature maps of each layer, such as:
insert image description here
through the above picture, it is easy to determine the two objects of concat. Of course, I have already Draw the network structure for everyone, and you can find it by following the picture.

class Concat(nn.Module):
    # Concatenate a list of tensors along dimension
    def __init__(self, dimension=1):
        super().__init__()
        self.d = dimension

    def forward(self, x):
        return torch.cat(x, self.d)

2.4 CSP/C3

Backbone needs a deeper network to obtain more information. It can be said that the backbone has already completed the extraction of the main feature information, so we do not need to blindly deepen the network in the Neck stage. It may be more appropriate to use the C3 module without residuals Some.
insert image description here
For the specific module structure and source code, refer to the introduction in backbone.

The main content of the neck end is roughly the above.

OVER

Guess you like

Origin blog.csdn.net/weixin_43427721/article/details/123653669