PyTorch realize the depth of separable convolution (to MobileNet for example)

Before introducing the depth separable convolution First tell us about the grouping convolution.

Packet convolution

Link Reference [1] has been described in detail convolutional packets, and not repeat them here. Principle can refer to reference links. But this article for the specific code grouping convolution of explanation, not a good feeling. Here for grouping convolution highlights the code to explain.

Direct call torch.nn.Conv2d () can be achieved grouped convolution. In Conv2d There is a parameter called groups, this parameter is the key to achieve this functionality. The following detailed solution describe this parameter.

groups

Convolution operation is implemented by the packet groups. The default is 1 groups, that is input into a group, in this case, a conventional convolution. value of several groups represented by the input channel is divided into several groups. When groups=in_channelsthe time is indicated by each of the input channels as a group, and then were subjected to the convolution, the output channel number is k, then the final output of each series, the last channel number in_channels * k

To elaborate this point, we give you a few specific examples. Next order to facilitate the calculation, here bias=False, additionally uses torchsummary module, can be mounted pip.

import torch
from torchsummary import summary
import torch.nn as nn

class CSDN_Tem(nn.Module):
    def __init__(self, in_ch, out_ch, groups):
        super(CSDN_Tem, self).__init__()
        self.conv = nn.Conv2d(
            in_channels=in_ch,
            out_channels=out_ch,
            kernel_size=3,
            stride=1,
            padding=1,
            groups=groups,
            bias=False
        )

    def forward(self, input):
        out = self.conv(input)
        return out

When tested, using input 64 * 64 * 3, and you can imagine is a 64 * 64 size image input. A size of 3 * 3 convolution kernel processing, shape the final output is 64 * 64 * 30.

Our first group 1 of conventional convolution operation.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 30, 1).to(device)
print(summary(conv,  input_size=(3, 64, 64)))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 30, 64, 64]             810
================================================================
Total params: 810
Trainable params: 810
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.94
Params size (MB): 0.00
Estimated Total Size (MB): 0.99
----------------------------------------------------------------

A total of 3 * 3 (kernel size) * 3 (input image shape) * (output picture shape) 30 = 810 parameters.

The following are groups = in_channels.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 30, 3).to(device)
print(summary(conv,  input_size=(3, 64, 64)))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 30, 64, 64]             270
================================================================
Total params: 270
Trainable params: 270
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.94
Params size (MB): 0.00
Estimated Total Size (MB): 0.99
----------------------------------------------------------------

Can be found parameters was significantly reduced. According to the reference [1] in principle, the last direct convolution, convolution kernel size is 3 * 3 * 3, to the overall output of 30-dimensional mapping of the input picture. So the final count is 810 parameters. However, when the packet is 3 when, in fact, the three channels of the input image to the umbrella grouping, the final output is also a three packets. For the dimension 30 is output, it is actually a channel 10 corresponding to the input dimensional output. In this case, we actually need convolution kernels 3 * 3 * 1 on it. So this amount parameter becomes 1 * 3 * 3 * (10 + 10 + 10) = 270, the output of the same dimensions, but full parameter is reduced three times.

Point by point convolution

Point by point is to use convolution kernel convolution 1 * 1 to convolution packet network after convolution to convolution final output to its original look. You can see reference [2] a more detailed description.

Separable convolution depth

Depth volume separable into depthwise convolution integral and pointwise convolution. Wherein the depth of convolution group is not only requires consistent and input_channel. Also requires input_channel and output_channel consistent. That should be carried out for each channel separately convolution. We look at an example:

Of particular note is a picture of the length and width should always maintain a consistent job.

class CSDN_Tem(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size, padding, groups):
        super(CSDN_Tem, self).__init__()
        self.conv = nn.Conv2d(
            in_channels=in_ch,
            out_channels=out_ch,
            kernel_size=kernel_size,
            stride=1,
            padding=padding,
            groups=groups,
            bias=False
        )

    def forward(self, input):
        out = self.conv(input)
        return out

Conventional convolution is this:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 30, 3, 1, 1).to(device)
print(summary(conv,  input_size=(3, 64, 64)))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 30, 64, 64]             810
================================================================
Total params: 810
Trainable params: 810
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.94
Params size (MB): 0.00
Estimated Total Size (MB): 0.99
----------------------------------------------------------------

Depth integration into volume separable two steps, first step, first depthwise convolution:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 3, 3, padding=1, groups=3).to(device)
print(summary(conv,  input_size=(3, 64, 64)))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 3, 64, 64]              27
================================================================
Total params: 27
Trainable params: 27
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.09
Params size (MB): 0.00
Estimated Total Size (MB): 0.14
----------------------------------------------------------------

After a pointwise convolution:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 30, kernel_size=1, padding=0, groups=1).to(device)
print(summary(conv,  input_size=(3, 64, 64)))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 30, 64, 64]              90
================================================================
Total params: 90
Trainable params: 90
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.94
Params size (MB): 0.00
Estimated Total Size (MB): 0.98
----------------------------------------------------------------

Achieve the same output. However, the amount of parameters has substantially reduced 8-9 fold.

reference

[1] Depth separable convolution (Depthwise Separable Convolution) and a packet convolution (Group Convolution) is understood that the correlation and achieve PyTorch
[2] Separable convolutional neural network in a Convolution

发布了189 篇原创文章 · 获赞 233 · 访问量 36万+

Guess you like

Origin blog.csdn.net/Einstellung/article/details/103585835