How Deep Learning Convolutions Operate

How Deep Learning Convolutions Operate

Familiar with various tricky convolution operations used in deep learning models to reduce the amount of parameters
Depthwise separable convolution, group convolution, dilated convolution

  • Group convolution (group convolution)
    principle: Divide the input (CxWxH) into g groups (into C/gx W x H), and then perform the corresponding convolution operation (such as 3x3, then k/gx3x3 for each group convolution operation), and finally the channel fusion operation is performed, and the final output parameter quantity is k/gx w2 x h2.
    insert image description here
  • Depthwise separable convolution

Depth separable convolution is similar to group convolution and is a variant of group convolution!

From the perspective of group convolution, the number of groups is like a control knob, the minimum value is 1, and the convolution at this time is ordinary convolution; the maximum value is the number of channels input to the feature map, and the convolution at this time is depthwise sepereable Convolution, that is, depth-separated convolution, is also called channel-by-channel convolution.

  • Dilated convolution
    Dilated convolution is a convolution idea proposed for the problem of downsampling in image semantic segmentation, which will reduce image resolution and lose information. The receptive field is expanded by adding holes, so that the original 3x3 convolution kernel has a 5x5 (dilated rate = 2) or larger receptive field under the same amount of parameters and calculations, so that no downsampling is required.

  • Dilated convolutions

Also known as atrous convolutions, a new parameter called "dilation rate" is introduced to the convolutional layer, which defines the spacing of the values ​​​​as the convolution kernel processes the data. In other words, compared to the original standard convolution, dilated convolution has one more hyper-parameter (hyperparameter) called dilation rate (expansion rate), which refers to the number of intervals before each point of the kernel. [The dilatation rate of normal convolution is 1].

Under the same calculation conditions, dilated convolution provides a larger receptive field. Atrous convolutions are often used in real-time image segmentation. When the network layer needs a large receptive field, but the computing resources are limited and the number or size of convolution kernels cannot be increased, dilated convolution can be considered.

Reference blog:
https://www.jianshu.com/p/a936b7bc54e3
https://blog.csdn.net/u012426298/article/details/80853553

Guess you like

Origin blog.csdn.net/ganbelieve/article/details/107965476