Separable convolution (Separable convolution) detailed explanation

separable convolution

Separable convolutions include spatially separable convolutions and depthwise separable convolutions.

Suppose the size of the feature is [channel, height, width]

  • Space refers to: [height, width] composed of these two dimensions.
  • Depth refers to the dimension of channel.

Spatial Separable Convolution

Has the following characteristics

  • Reduced number of multiplications
  • Reduced computational complexity
  • faster network

Spatial separable convolution is to split the standard convolution operation into multiple small convolution kernels in the spatial dimension. For example, we can split the convolution kernel into an outer product of two (or more) vectors.

The first version of separable convolution mainly deals with the spatial dimensions of images and kernels - height and width . It splits a core into two smaller cores, the most common of which is splitting a 3x3 core into a 3x1 and 1x3 core. So instead of doing one convolution with 9 multiplications, do two convolutions with 3 multiplications each which combined requires a total of 6 multiplications to achieve the same effect

insert image description here

One of the most famous kernels that can be separated spatially is Sobel (for edge detection)

insert image description here

Fewer matrix multiplications in spatially separable convolutions than in standard convolutions

In general, spatially separable convolution is to integrate the nxn convolution into 1xn and nx1 two-step calculations

  • The calculation method of ordinary 3x3 convolution on a 5x5 feature map is as shown in the figure below. Each position requires 9 multiplications, a total of 9 positions, and the entire operation requires 81 multiplications

insert image description here

  • The calculation method of the same situation in the spatially separable convolution is as shown in the figure below. The first step is to use a 3x1 filter, and the required calculation amount is: 15x3=45; the second step uses a 1x3 filter, and the required calculation amount is: 9x3 =27; a total of 72 multiplications are required to obtain the final result, which is smaller than the 81 multiplications of ordinary convolution

insert image description here

Depthwise Separable Convolution

Separable Convolution is described in Google's Xception and MobileNet papers. Its core idea is to decompose a complete convolution operation into two steps, namely Depthwise Convolution and Pointwise Convolution.

regular convolution

Assume that the input layer is a 64×64 pixel, three-channel color image. After a convolutional layer containing 4 Filters, 4 Feature Maps are finally output, with the same size as the input layer. The whole process can be summarized by the figure below. 4x3x3x3=108, refer to the figure below:

insert image description here

At this time, the convolutional layer has a total of 4 Filters, each Filter contains 3 Kernels, and the size of each Kernel is 3×3. Therefore, the number of parameters of the convolutional layer can be calculated by the following formula:N_std = 4 × 3 × 3 × 3 = 108

Depthwise Convolution (depthwise convolution (filtering))

In the same example as above, a three-channel color image with a size of 64×64 pixels first undergoes the first convolution operation. The difference is that this convolution is completely performed in a two-dimensional plane, and the number of Filters is the same as that of the above The Depth of a layer is the same. So a three-channel image is processed to generate three Feature maps, as shown in the figure below.

insert image description here

One of the Filters only contains a Kernel with a size of 3×3, and the number of parameters in the convolution part is calculated as follows:N_depthwise = 3 × 3 × 3 = 27

The number of Feature maps after Depthwise Convolution is completed is the same as the depth of the input layer, but this operation ends after the independent convolution operation for each channel of the input layer, and does not effectively use the information of different maps at the same spatial position. Therefore, it is necessary to add another step to combine these maps to generate a new Feature map, which is the next Pointwise Convolution.

Pointwise Convolution (pointwise convolution (combination))

The operation of Pointwise Convolution is very similar to the conventional convolution operation, except that the size of the convolution kernel is 1×1×M, and M is the depth of the previous layer. Therefore, the convolution operation here will weight and combine the maps in the previous step in the depth direction to generate a new Feature map. There are several Feature maps with several Filters. As shown below.

insert image description here

Since the 1×1 convolution method is used, the number of parameters involved in the convolution in this step can be calculated as:N_pointwise = 1 × 1 × 3 × 4 = 12

Parameter comparison

Parameter calculation

Refer to <<Convolution parameter calculation (standard convolution, group convolution, depth classifiable)>>
Simply put, it is how many parameters can be learned. The idea here is to convolve with input fmap, convolution Every point of weight on the kernel must be learned, so the amount to be learned is K square. Note that this is only the amount of parameters that need to be learned by the convolution kernel of a single channel. According to the input of how many channels come in, the convolution kernel is in Expand to the same number of channels, and then each channel of the input is convolved with the convolution kernel. According to the number of output channels, several sets of convolution kernel groups will be generated, and then multiplied.

  • The input size is H ∗ W ∗ C1
  • The convolution kernel is k∗k∗C1
  • Output feature map channel number C2
  • The standard convolution parameter amount is calculated as k * k * c1 * c2

The number of parameters for conventional convolution is:

N_std = 4 × 3 × 3 × 3 = 108

Depthwise classifiable convolution is mainly composed of

  • depthwise conv: Responsible for filtering with a size of D k ∗ D k ∗ 1 Dk * Dk * 1Dk∗Dk∗1, a total of M, (Dk is the Depthwise kernel size), acting on each channel of the input, so 1 means a single channel , M is divided into M groups according to the number of channels of the input features
  • pointwise conv: Responsible for channel conversion, the size is 1 ∗ 1 ∗ M 1 * 1 * M1∗1∗M, a total of N, because it is a 1x1 convolution, and the number of channels needs to be increased to M, a total of N convolutions nuclear

The parameters of Separable Convolution are obtained by adding two parts:

N_depthwise = 3 × 3 × 3 = 27
N_pointwise = 1 × 1 × 3 × 4 = 12
N_separable = N_depthwise + N_pointwise = 39

With the same input, 4 Feature maps are also obtained, and the number of parameters of Separable Convolution is about 1/3 of that of conventional convolution. Therefore, under the premise of the same amount of parameters, the number of neural network layers using Separable Convolution can be made deeper.

PyToch implements separable convolution

insert image description here

The input is (N, C_in, H, W) and the output is (N, C_out, H_out, W_out).

dilation to control the convolution expansion interval;

Groups group convolution to control the connection mode of input and output, in_channels and out_channels must be divisible by groups. When groups are set differently, group convolution or depthwise separable convolution can be distinguished:

  • When groups=1, it means ordinary convolutional layer
  • When groups<in_channels, it means ordinary group convolution
    • For example with group=2, the operation becomes equivalent to having two convolution layers side by side, each of which sees half the input channels and produces half the output channels, and both are then concatenated.
  • When groups=in_channels, it means depthwise separable convolution, each channel has its own set of filters, the size is [out_channels/in_channels]

sample code

import torch.nn as nn
import torch
from torchsummary import summary 

class Conv_test(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size, padding, groups):
        super(Conv_test, self).__init__()
        self.conv = nn.Conv2d(
            in_channels=in_ch,
            out_channels=out_ch,
            kernel_size=kernel_size,
            stride=1,
            padding=padding,
            groups=groups,
            bias=False
        )

    def forward(self, input):
        out = self.conv(input)
        return out
        
standard convolution
#标准的卷积层,输入的是3x64x64,目标输出4个feature map
# 参数:4x3x3x3 = 108
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = Conv_test(3, 4, 3, 1, 1).to(device)
print(summary(conv,  input_size=(3, 64, 64)))

insert image description here

depthwise convolution
# 逐深度卷积层,输入同上
# 参数:有3个通道的3x3x1 也就是27
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = Conv_test(3, 3, 3, padding=1, groups=3).to(device)
print(summary(conv,  input_size=(3, 64, 64)))

insert image description here

Pointwise convolution
# 逐点卷积,输入即逐深度卷积的输出大小,目标输出也是4个feature map
# 参数 1x1x3x4 = 12
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = Conv_test(3, 4, kernel_size=1, padding=0, groups=1).to(device)
print(summary(conv,  input_size=(3, 64, 64)))

insert image description here

group convolution
# 分组卷积层,输入的是4x64x64,目标输出4个feature map
# 参数数量 2x3x3x6
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = Conv_test(4, 6, 3, padding=1, groups=2).to(device)
print(summary(conv,  input_size=(4, 64, 64)))

insert image description here

reference documents

https://blog.csdn.net/qq_40406731/article/details/107398593
https://icode.best/i/06017834795045
https://blog.csdn.net/weixin_44638957/article/details/105177543

Follow for more deep learning articles

insert image description here

Guess you like

Origin blog.csdn.net/BXD1314/article/details/125749761