[pytorch, learning]-5.3 Multiple input channels and multiple output channels

reference

5.3 Multiple input channels and multiple output channels

The input and output we used in the previous two sections are both two-dimensional arrays, but the dimensions of real data are often higher. For example, a color image has three color channels of RGB (red, green, and blue) in addition to the two dimensions of height and width. Assuming that the height and width of the color image are h and w (pixels) respectively, then it can be expressed as a 3 * h * w multi-dimensional array. We call this dimension of size 3 the channel dimension. This section will introduce convolution kernels with multiple input channels or multiple output channels.

5.3.1 Multiple input channels

[External link image transfer failed. The origin site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-gZLOEKry-1594174473890)(attachment:image.png)]
Next, we implement the cross-correlation calculation with multiple input channels. We only need to perform cross-correlation operations on each channel, and then add_naccumulate through functions

import torch
import torch.nn as nn
import sys
sys.path.append("..")
import d2lzh_pytorch  as d2l

def corr2d_multi_in(X, K):
    # 沿着X和K的第0维(通道维)分别计算再相加
    res = d2l.corr2d(X[0, :, :], K[0, :, :])
    print(res)
    for i in range(1, X.shape[0]):  # X.shape[0]代表多少个通道,此处为2个
        res += d2l.corr2d(X[i, :, :], K[i, :, :])
    return res
X = torch.tensor([[[0,1,2],[3,4,5],[6,7,8]],[[1,2,3], [4,5,6], [7,8,9]] ])

K = torch.tensor([[[0,1],[2,3]], [[1,2],[3,4]]])

corr2d_multi_in(X, K)

Insert picture description here

5.3.2 Multiple output channels

When there are multiple input channels, because we have accumulated the results of the respective channels, the number of output channels is always 1 regardless of the number of input channels. Let the number of input channels and output channels of the convolution kernel be c(i) and c(o), respectively, and the height and width are k(h) and k(w), respectively. If we want to get output with multiple channels, we can create a kernel array of shape c(i) * k(k) * h(w) for each output channel. Connect them on the output channel dimension, the shape of the convolution kernel is c(o) * c(i) * k(h) * k(w). When doing cross-correlation operations, the result on each output channel is calculated from the kernel array of the convolution kernel on the output channel and the entire input array.

Simply put, if you want to output N channels, you need to create N C * H * W convolution kernels.
Next, implement a cross-correlation operation function to calculate the output of multiple channels.

def corr2d_multi_in_out(X, K):
    # 对K的第0维遍历,每次同输入X做互相关计算。所有结果使用stack函数合并在一起
    return torch.stack([corr2d_multi_in(X, k) for k in K])

We concatenate the kernel array K with K+1 (each element in K plus one) and K+2 to construct a convolution kernel with an output channel number of 3.

K = torch.tensor([[[0,1],[2,3]], [[1,2],[3,4]]])


# 构造3个卷积核
K = torch.stack([K, K+1, K+2])
K.shape

Insert picture description here
Next, we perform cross-correlation operations on the input array X and the kernel array K. The output at this time contains 3 channels. The result of the first channel is consistent with the previous calculation result of the input array X and the multi-input channel and single-output channel core.

# 输入的规模为  2 * 3 * 3 输出的规模为 3 * (3 - 2+ 1) * (3 - 2 + 1)
corr2d_multi_in_out(X, K)

Insert picture description here

5.3.3 1 * 1 convolutional layer

Insert picture description here

def corr2d_multi_in_out_1x1(X, K):
    c_i, h, w = X.shape
    c_o = K.shape[0]
    X = X.view(c_i, h * w)
    K = K.view(c_o, c_i)
    Y = torch.mm(K, X)  # 全连接层的矩阵乘法
    return Y.view(c_o, h, w)
X = torch.rand(3, 3, 3)
K = torch.rand(2, 3, 1, 1)

Y1 = corr2d_multi_in_out_1x1(X, K)
Y2 = corr2d_multi_in_out(X, K)

(Y1 - Y2).norm().item()  < 1e-6

Insert picture description here

Guess you like

Origin blog.csdn.net/piano9425/article/details/107199288