Commonly used functions in Pytorch (3) Detailed summary of common convolution operations in deep learning

Commonly used functions in Pytorch (3) Common convolution operations in deep learning

1. Standard Convolution (Standard Convolution)

1.1 Understanding of standard convolution

Let's look directly at two-dimensional convolution, which is the most common in practical applications.

insert image description here

Conv 2D in the picture above is actually a convolution kernel, also called a filter. The value of the filter determines the output. The process of model training is to adjust these values ​​to make the network output more accurate. Let's look at how convolutional algorithms combine filters and inputs.

Conceptually, each time we process a filter, to calculate an output value, we look at the input area located under the filter window.
In this example, we look at a 3x3 pixel area and use this area and filter to calculate the output. You can think of a filter as containing a pattern, and the output value is how well the input matches that pattern. In the network, for example, a shallow layer may be looking for a certain color, or an edge; while a deep layer may be looking for a dog.

insert image description here

After calculating the value of this area, we move to the next area and perform the same calculation again to get the next output value.

insert image description here

We continue this process and calculate the other outputs in this row. We then switch to a new line in both input and output, and repeat our previous operations for this new line.

We continue this process row by row until the entire input spatial range is covered.

insert image description here

Change to the next filter. Repeat the entire process with this new filter to form the next characteristic of the output. We do this for each filter to form all characteristics of the output.

insert image description here

1.2 APIs in Pytorch

torch.nn.Conv2d(
     in_channels,   # 输入通道数,即卷积核通道数
     out_channels,  # 输出通道数,即卷积核个数
     kernel_size, # 核大小
     stride=1,   # 步幅
     padding=0,  # 填充
     dilation=1, # 控制kernel点之间的空间距离
     groups=1,   # 分组卷积
     bias=True, 
     padding_mode='zeros', # 图像四周默认填充值为0
     device=None, 
     dtype=None
)

  • in_channels: represents the depth of the input feature matrix, that is, channel. For example, if you input an RGB color image, then in_channels=3

  • out_channels: represents the number of convolution kernels, using the depth of the feature matrix output by n convolution kernels, that is, channel is n

  • Kernel size: represents the size of the convolution kernel. The input can be an int type, such as 3 representing the height=width=3 of the convolution kernel, or a tuple type such as (3, 5) representing the height=3, width of the convolution kernel. =5

  • stride: represents the stride of the convolution kernel, which defaults to 1. Like kernel size, the input can be of int type or tuple type.

  • padding: means padding zeros around the input feature matrix. The default is 0. The input can also be of int type. For example, 1 means padding a row of 0 elements in the upper and lower directions, and padding a column of 0 pixels in the left and right directions (i.e. padding a circle of 0s). If input It is a tuple type, such as (2,1), which means two rows are added to the top, two rows are added to the bottom, one column is added to the left, and one column is added to the right.

  • The bias parameter indicates whether to use bias (used by default)

  • dilation and groups are advanced usages

  • CNN的卷积核通道数 = 卷积输入层的通道数

  • CNN的卷积输出层通道数(深度) = 卷积核的个数

标准卷积的参数及计算量的计算

insert image description here

The calculation formula of the matrix size after convolution is:

N = (W - K + 2P) / S + 1

For example: input matrix H=W=5, convolution kernel K=2, S=2, Padding=1.

N = (5 - 2 + 2✖1) / 2 + 1 = 3.5

How to deal with this in Pytorch?

Conclusion: 在卷积过程中会直接将最后一行以及最后一列给忽略掉,以保证N为整数, at this time N= (5 - 2 + 2 * 1 - 1) / 2 + 1 = 3 [ 即向下取整]

Notice:卷积核中的in_channels与需要进行卷积操作的数据x的channels一致

1.3 Case

举个例子:

Input an input feature map of 12×12×3, and after convolution with a 5×5×3 convolution kernel, an output feature map of 8×8×1 is obtained. If we have 256 convolution kernels at this time, we will get an 8×8×256 output feature map.

insert image description here

import torch
import torch.nn as nn

x = torch.rand(size=(1, 3, 12, 12))


model =  nn.Sequential(
     nn.Conv2d(
               in_channels=3,  # 卷积核中的in_channels与需要进行卷积操作的数据x的channels一致
               out_channels=1, # 输出通道数,即卷积核个数
               kernel_size=(5,5)
              )
)

# torch.Size([1, 1, 8, 8])
# N = (W - K + 2P) / S + 1 = (12 - 5 + 2 * 0 ) / 1 + 1 = 8
print(model(x).shape)  



model =  nn.Sequential(
     nn.Conv2d(
         in_channels=3,    # 输入通道数,即卷积核通道数
         out_channels=256, # 输出通道数,即卷积核个数
         kernel_size=(5,5) # 卷积核大小
     )
)

# torch.Size([1, 256, 8, 8])
# N = (W - K + 2P) / S + 1 = (12 - 5 + 2 * 0 ) / 1 + 1 = 8
print(model(x).shape) 

2. Group Convolution (Group Convolution)

2.1 Understanding of grouped convolution

For large deep networks that need to distinguish various visual scenes, we need a large number of features, especially at deeper levels, which exposes the performance scaling problem of convolution.

As shown in the figure below, at deeper levels, the number of input and output features of each layer increases.

insert image description here

As shown in the figure below, increasing the channels of input features will make the filters deeper (more in number), and increasing the channels of output features will result in more filters. Therefore, doubling the number of features will increase the amount of calculation by 4 times.

原来

insert image description here

增加输入特征的channels、增加输出特征的channels

insert image description here

想一想,每个滤波器真的需要查看输入的每个特征吗?肯定不是

Therefore, we can divide the input features into two groups, and each filter only needs to look at one of them. The first half of the filter looks at the first set of inputs, and the second half looks at the other set.

insert image description here

We start from the first set of input features and apply the corresponding filters. Note that the depth of each filter is only the same as the depth of the group, not the depth of the entire input. This is the performance improvement we want.

insert image description here

When we have used half of the filters, we move on to the next set of features and continue using the remaining filters. This is no different than separating the input and filter, performing separate convolutions, and then concatenating the results [set groups=2].

insert image description here

2.2 APIs in Pytorch

对输入feature map进行分组,然后每组分别卷积. This kind of grouping is only divided in depth [channels], that is, certain channels are grouped into a group. The specific number is determined by (C1/g). For example, if the number of channels of the input feature map is C1=20, and we divide it into g=5 groups, then each group has 4 convolution kernels.

insert image description here

torch.nn.Conv2d(
     in_channels,   
     out_channels,  
     kernel_size, 
     stride=1,   
     padding=0,  
     dilation=1, 
     groups=1,   # 分组卷积,默认为1组
     bias=True, 
     padding_mode='zeros' # 图像四周默认填充值为0
)

2.3 Case

x = torch.rand(size=(1, 256, 12, 12))

model =  nn.Sequential(
     nn.Conv2d(
          in_channels=256, 
          out_channels=32,   # 输出通道数,即卷积核的个数为256/8=32
          kernel_size=(3,3),
          padding=1,
          groups=8           # 分为8组
     )
)

# torch.Size([1, 32, 12, 12])
print(model(x).shape)

3. Point-by-point convolution (PW Convolution)

3.1 Understanding point-by-point convolution

insert image description here

Pointwise ConvolutionThe operation is very similar to the conventional convolution operation. The size of its product kernel is 1X1XM, M is the number of channels in the previous layer.

So the convolution operation here will weight the map of the previous step in the depth direction 目的是: 生成新的Feature map.

3.2 API and cases in Pytorch

torch.nn.Conv2d(
     in_channels,   
     out_channels,  
     kernel_size, # 逐点卷积,将卷积核的大小设置为1
     stride=1,   
     padding=0,  
     dilation=1, 
     groups=1,   
     bias=True, 
     padding_mode='zeros'
)

Suppose we get an 8×8×3 feature map, we use 256 1×1×3 convolution kernels to convolve the input feature map, and the output feature map becomes 8×8×256

x = torch.rand(size=(1, 3, 8, 8))

model =  nn.Sequential(
     nn.Conv2d(
          in_channels=3,
          out_channels=256, # 输出通道数,即卷积核的个数,256个1×1×3的卷积核
          kernel_size=(1,1) # 逐点卷积,卷积核的大小为1
     )
)

# torch.Size([1, 256, 8, 8])
print(model(x).shape)

4. Depth Convolution (DW Convolution)

4.1 Understanding of deep convolution

想一想,如果我们把所有的组都分出来,会有什么问题【这就是深度卷积】。

存在的问题就是:通道数太少,特征图的维度太少,不能获取到足够的有效信息,可以通过结合逐点卷积进行解决,即【深度可分离卷积】。

insert image description here

深度卷积(逐通道卷积)参数量的计算

insert image description here

4.2 API and cases in Pytorch

Groups is the key to realizing deep convolution. The default is 1, which means dividing the input into a group. This is a regular convolution.

When it is set to in channels, it means that each channel of the input is treated as a group and then convolved separately.

torch.nn.Conv2d(
     in_channels,   
     out_channels=in_channels,  # 深度卷积,out_channels=in_channels
     kernel_size, 
     stride=1,   
     padding=0,  
     dilation=1, 
     groups=in_channels,        #  深度卷积,将输入的每一个通道作为一组,groups=in_channels
     bias=True, 
     padding_mode='zeros'
)

insert image description here

What is different from the standard convolutional network is that we split the convolution kernel into a single-channel form , and perform a convolution operation on each channel without changing the depth of the input feature image , thus obtaining the sum of the input Output feature maps with the same number of feature map channels . As shown above: the input feature map is 12×12×3, and after a depth convolution of 5×5×1×3, an output feature map of 8×8×3 is obtained. The dimensions of the input and output are constant 3.

x = torch.rand(size=(1, 3, 12, 12))

model =  nn.Sequential(
     nn.Conv2d(
          in_channels=3,    
          out_channels=3,    # out_channels=in_channels
          kernel_size=(5,5),
          groups=3           # groups=in_channels
     )
)

# torch.Size([1, 3, 8, 8])
print(model(x).shape)

5. Depth separable convolution (PW+DW)

5.1 Understanding of depth separation convolution

In deep convolution, we separate all groups. However, at this time we find that the first output feature only depends on the first input feature.

insert image description here

This pattern continues deeper into the network. This way we never get the full expressive power of just one set.

insert image description here

一个滤波器从原始图中得到第一个输出特征,此时表达能力强

insert image description here

如何解决这个问题呢?我们可以在每个深度卷积后,加上1个标准的1✖1的卷积【逐点卷积】,而不是堆叠深度卷积。

[Pointwise convolution] has only one pixel in space and receives all input features at the same time.

insert image description here

This perfectly complements depthwise convolutions, which have a 3x3 receptive region in space but only one feature.

insert image description here

When we combine them, the output of both layers has a 3x3 spatial receptive area and all original features. This perfectly matches the receiving area with a set of 3x3 coupons. This is [depth separable convolution]. You may have noticed that point-to-point convolution has caused our original problem of doubling the number of features and quadrupling the calculation amount to come back, but we are still ahead of the standard convolution. If you look at the total computation performed by a 3x3 depthwise separable convolution, it is only about 11% of the computation of a standard 3x3 convolution, in other words, it is 9 times faster. Strictly speaking,
the speedup depends on the number of features, but as the number of features increases, the speedup gets closer and closer to the ideal 9x speed.

insert image description here

5.2 Case of depth separation convolution

Depthwise separable convolution is to split ordinary convolution into a depth convolution and a point-wise convolution .

Input an input feature map of 12×12×3, and obtain an output feature map of 8×8×1 after convolution with a 5×5×3 convolution kernel. If we have 256 feature maps at this time, we will get an 8×8×256 output feature map.

标准卷积

# 标准卷积
x = torch.rand(size=(1, 3, 12, 12))

model =  nn.Sequential(
     nn.Conv2d(
          in_channels=3,
          out_channels=256,
          kernel_size=(5,5)
     )
)

# torch.Size([1, 256, 8, 8])
print(model(x).shape)

深度可分离卷积

# 深度可分离卷积

x = torch.rand(size=(1, 3, 12, 12))

model =  nn.Sequential(
     # 深度卷积
     nn.Conv2d(
          in_channels=3,
          out_channels=3,    # out_channels=in_channels
          kernel_size=(5,5),
          groups=3           # groups=in_channels
     ),
     # 逐点卷积
     nn.Conv2d(
          in_channels=3,
          out_channels=256, # 输出通道数,即卷积核的个数,256个1×1×3的卷积核
          kernel_size=(1,1) # 逐点卷积,卷积核的大小为1
     )

)

# torch.Size([1, 256, 8, 8])
print(model(x).shape)

5.3 Comparison of calculation amount

标准卷积

insert image description here

深度可分离卷积

insert image description here

therefore:

insert image description here

我们通常所使用的是3×3的卷积核,也就是会下降到原来的九分之一到八分之一。

5.4 Code implementation

insert image description here

"""
     深度分离卷积
"""
 
import torch
import torch.nn as nn
 
 
class Depth_Wise_Conv(nn.Module):
    """
        深度可分离卷积 = 深度卷积 + 逐点卷积调整通道
    """
    def __init__(self, in_channel, out_channel):
        super(Depth_Wise_Conv, self).__init__()
        # 深度卷积
        self.conv_group = nn.Conv2d(in_channel, in_channel, kernel_size=3, stride=1, padding=1,                                                  groups=in_channel)
        # 逐点卷积调整通道
        self.conv_point = nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=1, padding=1, groups=1)
        # BN
        self.bn = nn.BatchNorm2d(out_channel)
        # activate
        self.act = nn.ReLU()
 
    def forward(self, inputs):
        """
            前向传播
        """
        x = self.conv_group(inputs)
        x = self.conv_point(x)
        x = self.bn(x)
        x = self.act(x)
        return x
 
 
if __name__ == '__main__':
    # 均匀分布产生数据
    x = torch.rand(1, 3, 16, 16)
    model = Depth_Wise_Conv(3, 16)
    model = model(x)
    print(model)

Reference blog:
Image part https://animatedai.github.io/
Summary of commonly used convolutions https://zhuanlan.zhihu.com/p/490761167
Lightweight neural network "tour" (2) - MobileNet, from V1 to V3 https://zhuanlan.zhihu.com/p/70703846

Guess you like

Origin blog.csdn.net/qq_44665283/article/details/131466517