Dilated Convolutions

1. Dilated convolution

Dilated Convolutions , translated as dilated convolution or hole convolution. Compared with ordinary convolution, dilated convolution has a dilation rate parameter in addition to the size of the convolution kernel, which is mainly used to indicate the size of the dilation. The same point between the expanded convolution and the ordinary convolution is that the size of the convolution kernel is the same. In the neural network, the number of parameters remains unchanged. The difference is that the expanded convolution has a larger receptive field. The receptive field is the size seen by the convolution kernel on the image, or how many pixels in the original image correspond to one pixel on the convolved image. For example, the receptive field size of a 3*3 convolution kernel is 9.

2. Schematic diagram

The figure below is a schematic diagram of dilated convolution.

insert image description here

(a) Ordinary convolution, 1-dilated convolution, the receptive field of the convolution kernel is 3×3=9.

(b) Expanded convolution, 2-dilated convolution, the receptive field of the convolution kernel is 7×7=49.

(c) Expanded convolution, 4-dilated convolution, the receptive field of the convolution kernel is 15×15=225.

As can be seen from the above figure, the number of parameters of the convolution kernel remains unchanged, and the size of the receptive field increases exponentially with the increase of the "dilation rate" parameter.

3. Advantages

The expanded convolution increases the receptive field of the convolution kernel while keeping the number of parameters constant, and at the same time it can ensure that the size of the output feature map (feature map) remains unchanged. A 3×3 convolution kernel with an expansion rate of 2 has the same receptive field as a 5×5 convolution kernel, but the number of parameters is only 9, which is 36% of the number of parameters of a 5×5 convolution.

There are also pooling layers, multi-layer convolutions, etc. to increase the receptive field. Pooling reduces the size of the feature map each time. In this way, when upsampling, a lot of information will be lost. Continuous multi-layer convolution, such as using 3*3, step_size=1 convolution kernel for many consecutive layers, although it can also increase the size of the feature map while increasing the receptive field, but the receptive field grows linearly, while expanding the convolution receptive field index increase.

4. Application

Dilated convolutions have applications in image segmentation, speech synthesis, machine translation, and object detection.

5. The grid problem of dilated convolution

Reference link click me

6. References

Multi-Scale Context Aggregation by Dilated Convolutions

Guess you like

Origin blog.csdn.net/pku_Coder/article/details/82776335