Deep Learning - Depthwise seperable convolution

1. Depthwise separable convolution (Depthwise separable convolution)
Some lightweight networks, such as mobilenet, will have a depthwise separable convolution depthwise separable convolution, which is combined by depthwise (DW) and pointwise (PW). Used to extract feature feature map. Compared with conventional convolution operations, the number of parameters and operation cost are relatively low\color{blue}{The number of parameters and operation cost are relatively low}The number of parameters and computational cost are relatively low .

2. Conventional convolution operation
For a 5x5x3 input, if you want to get a 3x3x4 feature map, then the shape of the convolution kernel is 3x3x3x4; if padding=1, then the output feature map is 5x5x4.
insert image description here
There are 4 Filters in the convolutional layer, each Filter contains 3 Kernels, and the size of each Kernel is 3×3. Therefore, the number of parameters of the convolutional layer can be calculated by the following formula (ie: convolution kernel W x convolution kernel H x number of input channels x number of output channels): N_std = 4 × 3 × 3 × 3 = 108 calculation amount (
ie : Convolution kernel W x convolution kernel H x (picture W-convolution kernel W+1) x (picture H-convolution kernel H+1) x number of input channels x number of output channels, with padding=0, no padding For demonstration, the output is 3 3 4, if filling the convolution kernel W x convolution kernel H x (picture W-convolution kernel W+2P+1) x (picture H-convolution kernel H+2P+1) x input Number of channels x number of output channels): C_std = 3 × 3 × (5 - 2) × (5 - 2) × 3 × 4 = 972 3. Depth-separable convolution Depth-separable convolution is mainly divided into two
processes
, They are Depthwise Convolution and Pointwise Convolution.

  • Channel-by-channel convolution (Depthwise Convolution)

A convolution kernel of Depthwise Convolution is responsible for one channel, and one channel is convolved by only one convolution kernel. The number of feature map channels generated by this process is exactly the same as the number of input channels.
A 5×5 pixel, three-channel color input image (shape 5×5×3), Depthwise Convolution first undergoes the first convolution operation, and DW is completely performed in a two-dimensional plane. The number of convolution kernels is the same as the number of channels in the previous layer (one-to-one correspondence between channels and convolution kernels). Therefore, a three-channel image is processed to generate three Feature maps (if there is same padding, the size is the same as the input layer, which is 5×5), as shown in the figure below. (The shape of the convolution kernel is: convolution kernel W x convolution kernel H x number of input channels)
insert image description here
One of the Filters only contains a Kernel with a size of 3×3, and the number of parameters in the convolution part is calculated as follows (that is: Convolution kernel Wx convolution kernel Hx number of input channels): N_depthwise = 3 × 3 × 3 = 27
The calculation amount is (ie: convolution kernel W x convolution kernel H x (picture W-convolution kernel W+1) x (Picture H-convolution kernel H+1) x number of input channels): C_depthwise=3 x 3 x (5 - 2) x (5 - 2) x 3 = 243 The number of Feature maps after Depthwise Convolution is completed and the number of input
layers The number of channels is the same, and the Feature map cannot be extended. Moreover, this operation independently performs convolution operations on each channel of the input layer, and does not effectively use the feature information of different channels at the same spatial position. Therefore, Pointwise Convolution is needed to combine these Feature maps to generate a new Feature map.

  • Pointwise Convolution

The operation of Pointwise Convolution is very similar to the conventional convolution operation. The size of its convolution kernel is 1×1×M, and M is the number of channels in the previous layer. Therefore, the convolution operation here will weight and combine the maps in the previous step in the depth direction to generate a new Feature map. There are several convolution kernels and several output Feature maps. (The shape of the convolution kernel is: 1 x 1 x number of input channels x number of output channels)
insert image description here
Since the 1×1 convolution method is used, the number of parameters involved in the convolution in this step can be calculated as (that is, : 1 x 1 x number of input channels x number of output channels): N_pointwise = 1 × 1 × 3 × 4 = 12
calculation amount (that is: 1 x 1 x feature layer W x feature layer H x number of input channels x number of output channels ): C_pointwise = 1 × 1 × 3 × 3 × 3 × 4 = 108
After Pointwise Convolution, 4 Feature maps are also output, which are the same as the output dimensions of conventional convolution.
4. Parameter comparison
Recall that the number of parameters of conventional convolution is:
N_std = 4 × 3 × 3 × 3 = 108
The parameters of Separable Convolution are obtained by adding two parts:
N_depthwise = 3 × 3 × 3 = 27
N_pointwise = 1 × 1 × 3 × 4 = 12
N_separable = N_depthwise + N_pointwise = 39
With the same input, 4 Feature maps are also obtained, and the number of parameters of Separable Convolution is about 1/3 of that of conventional convolution. Therefore, under the premise of the same amount of parameters, the number of neural network layers using Separable Convolution can be made deeper.
5. Calculation comparison
Recall that the calculation of conventional convolution is:
C_std =3 3 (5-2)*(5-2) 3 4=972
The calculation of Separable Convolution is obtained by adding two parts:
C_depthwise=3x3x (5-2)x(5-2)x3=243
C_pointwise = 1 × 1 × 3 × 3 × 3 × 4 = 108
C_separable = C_depthwise + C_pointwise = 351
same input, also get 4 Feature maps, Separable Convolution The amount of calculation is about 1/3 of the regular convolution. Therefore, with the same calculation amount, Depthwise Separable Convolution can make the number of neural network layers deeper.

Reference:
Depthwise seperable convolution

Guess you like

Origin blog.csdn.net/weixin_40826634/article/details/128199814