Multiple input channels and multiple output channels

Part of the study notes of "Practice Deep Learning pytorch" is only for your own review.

 Multiple input channels and multiple output channels

In the first two sections, the input and output we used are two-dimensional arrays, but the dimensions of the real data are often higher. For example, a color image has 3 color channels of RGB (red, green, blue) in addition to the two dimensions of height and width. Assuming that the height and width of the color image are h and w (pixels) respectively, then it can be expressed as a multi-dimensional array. We call this dimension with a size of 3 as the channel dimension . In this section we will introduce convolution kernels with multiple input channels or multiple output channels.

Multiple input channels

When the input data contains multiple channels, we need to construct a convolution kernel with the same number of input channels as that of the input data, so that it can perform cross-correlation calculations with input data containing multiple channels.

Figure 5.4 shows an example of the two-dimensional cross-correlation calculation with 2 input channels. On each channel, the two-dimensional input array and the two-dimensional kernel array do cross-correlation operations, and then add them according to the channels to get the output.

 

Next, we implement the cross-correlation calculation with multiple input channels. We only need to perform cross-correlation operations on each channel, and then use the add_n function to accumulate.

import torch
from torch import nn
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l
def corr2d_multi_in(X, K):
    # 沿着X和K的第0维(通道维)分别计算再相加
    res = d2l.corr2d(X[0, :, :], K[0, :, :])
    for i in range(1, X.shape[0]):
        res += d2l.corr2d(X[i, :, :], K[i, :, :])
    return res

We can construct the input array X and the kernel array K in Figure 5.4 to verify the output of the cross-correlation operation. 

X = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],[[1, 2, 3], [4, 5, 6], [7, 8, 9]]])
K = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]])
corr2d_multi_in(X, K)

Output: tensor([[ 56., 72.],
[104., 120.]])

Multiple output channels

Implement a cross-correlation calculation function to calculate the output of multiple channels.

def corr2d_multi_in_out(X, K):
# 对K的第0维遍历,每次同输⼊X做互相关计算。所有结果使用stack函数合并在一起
    return torch.stack([corr2d_multi_in(X, k) for k in K])

We concatenate the kernel array K with K+1 (each element of K plus one) and K+2 to construct a convolution kernel with 3 output channels.

K = torch.stack([K, K + 1, K + 2])
K.shape # torch.Size([3, 2, 2, 2])

Below we perform cross-correlation operations on the input array X and the kernel array K. The output at this time contains 3 channels. The result of the first channel is consistent with the previous calculation results of the input array X and the multi-input channel and single-output channel core.

corr2d_multi_in_out(X, K) 

Output:

tensor([[[ 56., 72.],
[104., 120.]],
[[ 76., 100.],
[148., 172.]],
[[ 96., 128.],
[192., 224.]]])

1X1 convolutional layer

Finally, we discuss the multi-channel convolutional layer with a convolution window shape of (1x1). We usually call it 1X1 convolution layer, and the convolution operation in it is called 1X1 convolution. Because the 了minimum window is used, the convolution loses the function of the 了convolutional layer that can recognize patterns formed by adjacent elements in the height and width dimensions. In fact, the main calculation of 1x1 convolution occurs on the channel dimension. Figure 5.5 shows the cross-correlation calculation using a convolution kernel with 3 input channels and 2 output channels. It is worth noting that the input and output have the same height and width. Each element in the output is based on the weighted accumulation of elements in the same position in height and width in the input. Assuming that we regard the channel dimension as the feature dimension and the elements in the height and width dimensions as data samples, then the function of the 1x1 convolutional layer is equivalent to the fully connected layer.

Below we use matrix multiplication in the fully connected layer to achieve 1x1 convolution. This requires some adjustments to the data shape before and after the matrix multiplication operation.

def corr2d_multi_in_out_1x1(X, K):
    c_i, h, w = X.shape
    c_o = K.shape[0]
    X = X.view(c_i, h * w)
    K = K.view(c_o, c_i)
    Y = torch.mm(K, X) # 全连接层的矩阵乘法
    return Y.view(c_o, h, w)

It has been verified that when doing 1x1 convolution, the above function is equivalent to the previously implemented cross-correlation operation function corr2d_multi_in_out.

Output: True

In the later model 里 we will see that the 1x1 convolutional layer is used as a fully connected layer that maintains high and wide-dimensional shape 不. Therefore, we can control the model complexity by adjusting the number of channels between network layers.

summary

  • Using multiple channels can expand the model parameters of the convolutional layer.
  • Assuming that the channel dimension is regarded as the feature dimension, and the elements in the height and width dimensions are regarded as data samples, then the role of the 1x1 convolutional layer is equivalent to the fully connected layer.
  • The 1x1 convolutional layer is usually used to adjust the number of channels between the network layers and control the model complexity.

Guess you like

Origin blog.csdn.net/dujuancao11/article/details/108491029