Deep learning - dilated convolution

1. Preface
Dilated Convolution ( Dilated Convolution ) is also called dilated convolution or dilated convolution, which injects holes into the standard convolution kernel to increase the model's receptive field ( receptionfield ) \color{blue }{Dilated Convolution (Dilated Convolution), also known as dilated convolution or dilated convolution, is to inject holes into the standard convolution kernel to increase the model's receptive field (reception field)}Dilated convolution ( D i l a ted C o n v o lu t i o n ) is also called dilated convolution or dilated convolution, which injects holes into the standard convolution kernel to increase The receptive field of the model ( recept i o n fi e l d ) . Compared with the original normal convolution operation, dilated convolution has one more parameter: dilation rate, which refers to the number of intervals between the points of the convolution kernel. For example, the dilatation rate of the conventional convolution operation is 1.
In the CNN structure, most layers are completed by Conv and Pooling. These two layers are two very important components in CNN. Generally speaking, for an image classification task, a backbone stacked by Conv and Pooling has good feature extraction capabilities. The most classic structure of this stack is VGG. After the image is output to the network, Conv performs feature extraction, while Pooling performs feature aggregation, and makes the model have a certain degree of translation invariance, and can also reduce the computing power of the subsequent convolutional layer. Finally, it is enough to output the classification result to the fully connected layer.
However, this structure has some problems for target detection and image segmentation:

  • The receptive field is very important in target detection and image segmentation. For example, target detection generally needs to be predicted on the last layer of feature maps, so how many pixels a point on the feature map can be mapped to the original image determines the network. The upper limit of the size is reached, and the guarantee of the receptive field depends on downsampling. The result of downsampling is that small targets are not easy to be detected;
  • For the above problem, the multi-layer feature map pull branch can improve this problem, because the smaller target is easier to reflect on the earlier feature map, but the semantic information of the previous feature map is not enough, for example, this problem exists in SSD;
  • Do not perform downsampling, but only increase the number of convolutional layers, firstly, it will increase the amount of calculation of the network, and secondly, the final feature extraction effect without Pooling aggregation will also be affected, and the receptive field will not change.
    So is there a way to increase the receptive field without sacrificing the size of the feature map?
    Dilated convolutions do just that.

2. The principle of
insert image description here
dilated convolution operation is still well understood. (a) in the above figure is the basic convolution kernel, and dilated convolution is to add intervals to this basic convolution kernel. The above picture (b) corresponds to a dilation of 3 × 3 Convolution with rate=2, but the interval is 1, which is equivalent to the image block corresponding to 7 × 7. It can be understood that the kernel size has become 7 × 7, but only 9 points have parameters, and the rest of the position parameters Both are 0, and the convolution calculation is performed with the pixel corresponding to the input feature map, and the rest of the positions are skipped. Figure © is similar to Figure (b), except that dilation rate=4, which is equivalent to becoming a 15 × 15 convolution kernel.
As the size of the convolution kernel becomes larger, the receptive field will naturally become larger.

Refer to
How to understand dilated convolution (dilated convolution)

Guess you like

Origin blog.csdn.net/weixin_40826634/article/details/128200543