Before introducing the depth separable convolution First tell us about the grouping convolution.
Packet convolution
Link Reference [1] has been described in detail convolutional packets, and not repeat them here. Principle can refer to reference links. But this article for the specific code grouping convolution of explanation, not a good feeling. Here for grouping convolution highlights the code to explain.
Direct call torch.nn.Conv2d () can be achieved grouped convolution. In Conv2d There is a parameter called groups
, this parameter is the key to achieve this functionality. The following detailed solution describe this parameter.
groups
Convolution operation is implemented by the packet groups. The default is 1 groups, that is input into a group, in this case, a conventional convolution. value of several groups represented by the input channel is divided into several groups. When groups=in_channels
the time is indicated by each of the input channels as a group, and then were subjected to the convolution, the output channel number is k, then the final output of each series, the last channel number in_channels * k
To elaborate this point, we give you a few specific examples. Next order to facilitate the calculation, here bias=False
, additionally uses torchsummary module, can be mounted pip.
import torch
from torchsummary import summary
import torch.nn as nn
class CSDN_Tem(nn.Module):
def __init__(self, in_ch, out_ch, groups):
super(CSDN_Tem, self).__init__()
self.conv = nn.Conv2d(
in_channels=in_ch,
out_channels=out_ch,
kernel_size=3,
stride=1,
padding=1,
groups=groups,
bias=False
)
def forward(self, input):
out = self.conv(input)
return out
When tested, using input 64 * 64 * 3, and you can imagine is a 64 * 64 size image input. A size of 3 * 3 convolution kernel processing, shape the final output is 64 * 64 * 30.
Our first group 1 of conventional convolution operation.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 30, 1).to(device)
print(summary(conv, input_size=(3, 64, 64)))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 30, 64, 64] 810
================================================================
Total params: 810
Trainable params: 810
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.94
Params size (MB): 0.00
Estimated Total Size (MB): 0.99
----------------------------------------------------------------
A total of 3 * 3 (kernel size) * 3 (input image shape) * (output picture shape) 30 = 810 parameters.
The following are groups = in_channels.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 30, 3).to(device)
print(summary(conv, input_size=(3, 64, 64)))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 30, 64, 64] 270
================================================================
Total params: 270
Trainable params: 270
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.94
Params size (MB): 0.00
Estimated Total Size (MB): 0.99
----------------------------------------------------------------
Can be found parameters was significantly reduced. According to the reference [1] in principle, the last direct convolution, convolution kernel size is 3 * 3 * 3, to the overall output of 30-dimensional mapping of the input picture. So the final count is 810 parameters. However, when the packet is 3 when, in fact, the three channels of the input image to the umbrella grouping, the final output is also a three packets. For the dimension 30 is output, it is actually a channel 10 corresponding to the input dimensional output. In this case, we actually need convolution kernels 3 * 3 * 1 on it. So this amount parameter becomes 1 * 3 * 3 * (10 + 10 + 10) = 270, the output of the same dimensions, but full parameter is reduced three times.
Point by point convolution
Point by point is to use convolution kernel convolution 1 * 1 to convolution packet network after convolution to convolution final output to its original look. You can see reference [2] a more detailed description.
Separable convolution depth
Depth volume separable into depthwise convolution integral and pointwise convolution. Wherein the depth of convolution group is not only requires consistent and input_channel. Also requires input_channel and output_channel consistent. That should be carried out for each channel separately convolution. We look at an example:
Of particular note is a picture of the length and width should always maintain a consistent job.
class CSDN_Tem(nn.Module):
def __init__(self, in_ch, out_ch, kernel_size, padding, groups):
super(CSDN_Tem, self).__init__()
self.conv = nn.Conv2d(
in_channels=in_ch,
out_channels=out_ch,
kernel_size=kernel_size,
stride=1,
padding=padding,
groups=groups,
bias=False
)
def forward(self, input):
out = self.conv(input)
return out
Conventional convolution is this:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 30, 3, 1, 1).to(device)
print(summary(conv, input_size=(3, 64, 64)))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 30, 64, 64] 810
================================================================
Total params: 810
Trainable params: 810
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.94
Params size (MB): 0.00
Estimated Total Size (MB): 0.99
----------------------------------------------------------------
Depth integration into volume separable two steps, first step, first depthwise convolution:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 3, 3, padding=1, groups=3).to(device)
print(summary(conv, input_size=(3, 64, 64)))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 3, 64, 64] 27
================================================================
Total params: 27
Trainable params: 27
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.09
Params size (MB): 0.00
Estimated Total Size (MB): 0.14
----------------------------------------------------------------
After a pointwise convolution:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv = CSDN_Tem(3, 30, kernel_size=1, padding=0, groups=1).to(device)
print(summary(conv, input_size=(3, 64, 64)))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 30, 64, 64] 90
================================================================
Total params: 90
Trainable params: 90
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.94
Params size (MB): 0.00
Estimated Total Size (MB): 0.98
----------------------------------------------------------------
Achieve the same output. However, the amount of parameters has substantially reduced 8-9 fold.
reference
[1] Depth separable convolution (Depthwise Separable Convolution) and a packet convolution (Group Convolution) is understood that the correlation and achieve PyTorch
[2] Separable convolutional neural network in a Convolution