Principle of two-dimensional convolutional layer

Mainly introduce the working principle of two-dimensional convolutional layer

Convolutional neural network is a neural network containing convolutional layers

1. Two-dimensional cross-correlation operation

In the two-dimensional convolutional layer, a two-dimensional input array and a two-dimensional kernel array output a two-dimensional array through a cross-correlation operation.
For example:
Input array: 3x3 two-dimensional array
Kernel array: 2x2 two-dimensional array (this array is also called convolution kernel or filter in convolution calculation)
example
In the two-dimensional cross-correlation operation, the convolution window is from the input array Starting from the top left, sliding on the input array in order from left to right and top to bottom . When the convolution window slides to a certain position, the input sub-array and the kernel array in the window are multiplied and summed element-wise to obtain the element at the corresponding position in the output array.

The above process is implemented in the corr2d function, which accepts the input array X and the kernel array K, and outputs the array Y.

import torch 
from torch import nn

def corr2d(X, K):  # 本函数已保存在d2lzh_pytorch包中方便以后使用
    h, w = K.shape
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i: i + h, j: j + w] * K).sum()
    return Y

verification:

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
K = torch.tensor([[0, 1], [2, 3]])
corr2d(X, K)

Output

tensor([[19., 25.],
        [37., 43.]])

2. Two-dimensional convolutional layer

The two-dimensional convolution layer performs cross-correlation operations on the input and the convolution kernel, and adds a scalar deviation to get the output. The model parameters of the convolution layer include the convolution kernel and scalar deviation. When training the model, usually we first randomly initialize the convolution kernel, and then iterate over the convolution kernel and bias.

A custom two-dimensional convolutional layer is implemented based on the corr2d function. In the constructor __init__, we declare the two model parameters weight and bias. The forward calculation function forward is to directly call the corr2d function and add the deviation.

class Conv2D(nn.Module):
    def __init__(self,kernel_size):
        super(Conv2D, self).__init__()
        self.weight = nn.Parameter(torch.randn(kernel_size))
        self.bias = nn.Parameter(torch.randn(1))

    def forward(self,x):
        return corr2d(x,self.weight)+self.bias

A convolutional layer with a convolution window shape of p×q becomes a p×q convolutional layer. Similarly, p×q convolution or p×q convolution kernel indicates that the height and width of the convolution kernel are p and q, respectively.

3. Cross-correlation operation and convolution operation

In fact, the convolution operation is similar to the cross-correlation operation. In order to get the output of the convolution operation, we only need to flip the kernel array left and right and up and down, and then perform cross-correlation operations with the input array. It can be seen that although the convolution operation and the cross-correlation operation are similar, if they use the same kernel array, the output is often different for the same input.

Then, you may be wondering why the convolution layer can use cross-correlation operations instead of convolution operations. In fact, the kernel array is learned in deep learning: whether the convolutional layer uses cross-correlation or convolution operations, it does not affect the output of the model prediction.
In order to be consistent with most deep learning literature, unless otherwise specified, the convolution operations mentioned in this book refer to cross-correlation operations.

4. Feature map and receptive field

The two-dimensional array output by the two-dimensional convolutional layer can be regarded as a representation of a certain level of the input in the spatial dimensions (width and height), also called a feature map. All possible input areas (which may be larger than the actual size of the input) that affect the forward calculation of the element x are called the receptive fields of x. Take the above figure as an example, the four elements in the shaded part of the input are the receptive fields of the shaded elements in the output. We denote the output of shape 2×2 in Figure 5.1 as Y, and consider a deeper convolutional neural network: cross-correlate Y with another kernel array of shape 2×2, and output a single element z. Then, the receptive field of z on Y includes all four elements of Y, and the receptive field on input includes all 9 elements. It can be seen that we can make the receptive field of a single element in the feature map wider through a deeper convolutional neural network, so as to capture the larger-sized features on the input.

Summary:
The core calculation of a two-dimensional convolutional layer is a two-dimensional cross-correlation operation. In the simplest situation, it performs a cross-correlation operation on the two-dimensional input data and the convolution kernel and then adds a bias.
The convolution kernel can be learned through data

Reference: "Hands-on Deep Learning" https://github.com/ShusenTang/Dive-into-DL-PyTorch