Hole convolution atrous/dilated convolution

1. Definition

Atrous/dilated convolution, also known as dilated convolution, is a convolution idea proposed for the problem of downsampling in image semantic segmentation that reduces image resolution and loses information. Atrous convolution introduces a new parameter called "dilation rate" to the convolutional layer, which defines the number of intervals between the points of the convolution kernel, that is, the expansion coefficient is equal to the convolution kernel height and the expansion factor in the width direction.
The dilatation rate of ordinary standard convolution is 1

2. Graphic definition

insert image description here

  • Figure (a) is a 3×3 1-dilated conv, which is the same as the normal convolution operation
  • Figure (b) is a 3×3 2-dilated conv. The actual convolution kernel size is still 3×3, but the hole is 1, and the position of the hole is filled with 0. After filling, the convolution kernel size becomes 5× 5, and then perform the convolution operation
  • Figure © is a 3×3 4-dilated conv
    Note: The green background in the figure is the receptive field of the expanded convolution kernel; the convolution operation is performed with the expanded convolution kernel size

3, official

The size of the expanded convolution kernel = expansion rate r × (original convolution kernel size - 1) + 1
hole interval between elements = expansion rate r - 1
Example:
Figure (a), expansion rate r=1 ---- > Hole interval between elements = 0 ----> Convolution kernel size after expansion = 3×3
picture (b), expansion rate r=2 ----> Hole interval between elements = 1 ---- > Expanded convolution kernel size = 5×5
Figure ©, expansion rate r=4 ----> Hole interval between elements = 3 ----> Expanded convolution kernel size = 9×9

4. Advantages and problems

Advantages:
(1) Hole convolution has a larger receptive field, and the amount of parameters and calculations does not change;
(2) The internal data structure is preserved;
(3) Avoid using downsampling.
There are problems:
(1) The elements of the convolution kernel of hole convolution are discontinuous, that is, not all pixels are used for calculation, so the way of viewing information as a checker-board here will lose the continuity of information. (Grid effect, expansion convolution cannot cover all image features, as shown in the figure below)
insert image description here
(2) Our dilated convolution is designed to obtain long-ranged information. However, the use of large dilation rate information may only be effective for some large objects, but it may be harmful to small objects. How to deal with the relationship between objects of different sizes at the same time is the key to designing a dilated convolutional network.

5. Hybrid Hole Convolution HDC

Hybrid dilation convolution is proposed for the problems of dilation convolution. Compared with dilation convolution, it mainly includes the following three characteristics: (1
) The dilation rate of superimposed convolution cannot have a common divisor greater than 1. For example, [2, 4, 6] is not a good three-layer convolution, and the gridding effect will still appear.
(2) We design the dilation rate into a sawtooth structure, such as [1, 2, 5, 1, 2, 5] loop structure.
(3) The following formula needs to be satisfied:
insert image description here
Among them, ri​is the dilation rate of the i-th layer, and Mi​refers to the maximum dilation rate in the i-layer, then assuming that there are n layers in total, the default Mn ​= rn​, assuming The size of the convolution kernel we use is k × k, and our goal is M2 ​≤ k, so that we can at least cover all holes with dilation rate = 1, that is, ordinary convolution.

Example: dilation rate = [1, 2, 5] with 3 x 3 kernel (feasible solution)
insert image description here

Guess you like

Origin blog.csdn.net/Chenjiahui_LYee/article/details/128603489