reference
5.3 Multiple input channels and multiple output channels
The input and output we used in the previous two sections are both two-dimensional arrays, but the dimensions of real data are often higher. For example, a color image has three color channels of RGB (red, green, and blue) in addition to the two dimensions of height and width. Assuming that the height and width of the color image are h and w (pixels) respectively, then it can be expressed as a 3 * h * w multi-dimensional array. We call this dimension of size 3 the channel dimension. This section will introduce convolution kernels with multiple input channels or multiple output channels.
5.3.1 Multiple input channels
Next, we implement the cross-correlation calculation with multiple input channels. We only need to perform cross-correlation operations on each channel, and then add_n
accumulate through functions
import torch
import torch.nn as nn
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l
def corr2d_multi_in(X, K):
# 沿着X和K的第0维(通道维)分别计算再相加
res = d2l.corr2d(X[0, :, :], K[0, :, :])
print(res)
for i in range(1, X.shape[0]): # X.shape[0]代表多少个通道,此处为2个
res += d2l.corr2d(X[i, :, :], K[i, :, :])
return res
X = torch.tensor([[[0,1,2],[3,4,5],[6,7,8]],[[1,2,3], [4,5,6], [7,8,9]] ])
K = torch.tensor([[[0,1],[2,3]], [[1,2],[3,4]]])
corr2d_multi_in(X, K)
5.3.2 Multiple output channels
When there are multiple input channels, because we have accumulated the results of the respective channels, the number of output channels is always 1 regardless of the number of input channels. Let the number of input channels and output channels of the convolution kernel be c(i) and c(o), respectively, and the height and width are k(h) and k(w), respectively. If we want to get output with multiple channels, we can create a kernel array of shape c(i) * k(k) * h(w) for each output channel. Connect them on the output channel dimension, the shape of the convolution kernel is c(o) * c(i) * k(h) * k(w). When doing cross-correlation operations, the result on each output channel is calculated from the kernel array of the convolution kernel on the output channel and the entire input array.
Simply put, if you want to output N channels, you need to create N C * H * W convolution kernels.
Next, implement a cross-correlation operation function to calculate the output of multiple channels.
def corr2d_multi_in_out(X, K):
# 对K的第0维遍历,每次同输入X做互相关计算。所有结果使用stack函数合并在一起
return torch.stack([corr2d_multi_in(X, k) for k in K])
We concatenate the kernel array K with K+1 (each element in K plus one) and K+2 to construct a convolution kernel with an output channel number of 3.
K = torch.tensor([[[0,1],[2,3]], [[1,2],[3,4]]])
# 构造3个卷积核
K = torch.stack([K, K+1, K+2])
K.shape
Next, we perform cross-correlation operations on the input array X and the kernel array K. The output at this time contains 3 channels. The result of the first channel is consistent with the previous calculation result of the input array X and the multi-input channel and single-output channel core.
# 输入的规模为 2 * 3 * 3 输出的规模为 3 * (3 - 2+ 1) * (3 - 2 + 1)
corr2d_multi_in_out(X, K)
5.3.3 1 * 1 convolutional layer
def corr2d_multi_in_out_1x1(X, K):
c_i, h, w = X.shape
c_o = K.shape[0]
X = X.view(c_i, h * w)
K = K.view(c_o, c_i)
Y = torch.mm(K, X) # 全连接层的矩阵乘法
return Y.view(c_o, h, w)
X = torch.rand(3, 3, 3)
K = torch.rand(2, 3, 1, 1)
Y1 = corr2d_multi_in_out_1x1(X, K)
Y2 = corr2d_multi_in_out(X, K)
(Y1 - Y2).norm().item() < 1e-6