Two-dimensional convolutional layer

Part of the study notes of "Practice Deep Learning pytorch" is only for your own review.

Two-dimensional convolutional layer

Convolutional neural network (convolutional neural network) is a neural network that contains a convolutional layer. The convolutional neural networks introduced in this chapter all use the most common two-dimensional convolutional layer. It has two spatial dimensions, height and width, and is commonly used to process image data .

Two-dimensional cross-correlation operation

Although convolutional layer is named for the convolution (Convolution) operation, but we generally convolution Using a more intuitive manipulation layer cross-correlation (cross-correlation) operation. In a two-dimensional convolutional layer, a two-dimensional input array and a two-dimensional kernel (kernel) array output a two-dimensional array through a cross-correlation operation.

We use a specific example to explain the meaning of the two-dimensional cross-correlation operation. As shown in Figure 5.1, the input is a two-dimensional array whose height and width are both 3. We denote the shape of the array as 3×3 or (3, 3). The height and width of the nuclear array are 2 respectively. This array is also called convolution kernel or filter (filter) in convolution calculation . The shape of the convolution kernel window (referred to as the convolution window) depends on the height and width of the convolution kernel 2×2, ie. The shaded part in Figure 5.1 is the first output element and the input and kernel array elements used in its calculation:

The above process is implemented in the corr2d function. It accepts input array X and kernel array K, and outputs array Y.

import torch
from torch import nn
def corr2d(X, K): 
# 本函数已保存在d2lzh_pytorch包中⽅便以后使用
    h, w = K.shape
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i: i + h, j: j + w] * K).sum()
    return Y

We can construct the input array X and the kernel array K in Figure 5.1 to verify the output of the two-dimensional cross-correlation operation.

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
K = torch.tensor([[0, 1], [2, 3]])
corr2d(X, K)

Output:

tensor([[19., 25.],
[37., 43.]])

Two-dimensional convolutional layer

The two-dimensional convolutional layer will cross-correlate the input and the convolution kernel , and add a scalar deviation to get the output. The model parameters of the convolution layer include the convolution kernel and the standard deviation . When training the model, usually we first randomly initialize the convolution kernel , and then iterate over the convolution kernel and bias .
The following is based on the corr2d function to implement a custom two-dimensional convolutional layer.

class Conv2D(nn.Module):
    # 在构造函数 __init__ ⾥我们声明 weight和 bias 这两个模型参数。
    def __init__(self, kernel_size):
        super(Conv2D, self).__init__()
        self.weight = nn.Parameter(torch.randn(kernel_size))
        self.bias = nn.Parameter(torch.randn(1))
    # 前向计算函数 forward 则是直接调用 corr2d 函数再加上偏差。
    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

The convolutional layer whose convolution window shape is pxq is called pxq convolutional layer. Similarly, pxq convolution or pxq convolution kernel means that the height and width of the convolution kernel are p and q respectively.

 Edge detection of objects in images

Below we look at a simple application of a convolutional layer : detecting the edges of objects in an image, that is, finding the position of pixel changes. First we construct a 6x8 image (that is, an image with 6 pixels and 8 pixels in height and width respectively). The middle 4 columns are black (0), and the rest are white (1).

X = torch.ones(6, 8)
X[:, 2:6] = 0
X

Output:

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.]])

Then we construct a convolution kernel K whose height and width are 1 and 2, respectively. When it is cross-correlation operation with the input , if the horizontal adjacent elements are the same, the output is 0; otherwise, the output is non-zero.

K = torch.tensor([[1, -1]]) 

Below, we will input X and the convolution kernel K we designed for cross-correlation calculation. It can be seen that we detect the edge from white to ⿊ and the edge from ⿊ to white as 了1 and -1 respectively. The rest of the output is all 0.

Y = corr2d(X, K)
Y

Output:

tensor([[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.]])

From this, we can see that the convolutional layer can effectively represent the local space by repeatedly using the convolution kernel .

Learn nuclear arrays from data

An example, it uses the input data X and output data Y in object edge detection to learn the kernel array K we constructed. We first construct a convolution layer, and its convolution kernel will be initialized into a random array . Next, in each iteration, we use the squared error to compare Y and the output of the convolutional layer, and then calculate the gradient to update the weight .

# 构造⼀一个核数组形状是(1, 2)的⼆二维卷积层
conv2d = Conv2D(kernel_size=(1, 2))
step = 20
lr = 0.01
for i in range(step):
    Y_hat = conv2d(X)
    l = ((Y_hat - Y) ** 2).sum()
    l.backward()
    # 梯度下降
    conv2d.weight.data -= lr * conv2d.weight.grad
    conv2d.bias.data -= lr * conv2d.bias.grad
    # 梯度清0
    conv2d.weight.grad.fill_(0)
    conv2d.bias.grad.fill_(0)
    if (i + 1) % 5 == 0:
        print('Step %d, loss %.3f' % (i + 1, l.item()))

Output:

Step 5, loss 1.844
Step 10, loss 0.206
Step 15, loss 0.023
Step 20, loss 0.003

It can be seen that the error has been reduced to a relatively small value after 20 iterations. Now look at the parameters of the learned convolution kernel.

print("weight: ", conv2d.weight.data)
print("bias: ", conv2d.bias.data)

Output:

weight: tensor([[ 0.9948, -1.0092]])
bias: tensor([0.0080])

It can be seen that the weight parameter of the learned convolution kernel is closer to the kernel array K we defined earlier, but the bias parameter is close to 0.

Cross-correlation operation and convolution operation

In fact, the convolution operation is similar to the cross-correlation operation . In order to get the output of the convolution operation, we only need to flip the kernel array left and right and up and down , and then perform the cross-correlation operation with the input array. However, although the convolution operation and the cross-correlation operation are similar, if they use the same kernel array, the output is often the same for the same input.
Then, you may be wondering why the convolution layer can use cross-correlation operations instead of convolution operations. In fact, kernel arrays are all learned in deep learning : the use of cross-correlation operations or convolution operations in the convolutional layer does not affect the output of the model when it is predicted. To explain this point, suppose that the convolutional layer uses the cross-correlation operation to learn the kernel array in Figure 5.1. Set other conditions 不 change, use the kernel array learned by convolution operation, that is, the kernel array in Figure 5.1, flip up and down, left and right. That is to say, when the input in Figure 5.1 and the learned flipped kernel array are convolved again, the output in Figure 5.1 is still obtained. The convolution operations mentioned in this book all refer to cross-correlation operations.

Feature map and receptive field

Layer ⼆ dimensional convolution output two-dimensional array can be considered an input stage of a ⼀ characterized in spatial dimensions (width and ADVANCED), also known as feature map (feature map). All possible input areas (which may be larger than the actual size of the input) that affect the forward calculation of the element x are called the receptive fields of x. Taking Figure 5.1 as an example, the four elements in the shaded part of the input are the receptive fields of the shaded elements in the output. We denote the output of shape 2x2 in Figure 5.1 as Y, and consider a 更 deep convolutional neural network: cross-correlate Y with another kernel array of shape to output a single element z. Then, the receptive field of z on Y includes all four elements of Y, and the receptive field on input includes all 9 of them. It can be seen that we can use a deeper convolutional neural network to make the receptive field of a single element in the feature map wider , so as to capture the features of the input 更large size. We often use the term "element" to describe the members of an array or matrix. In neural network terminology, these elements can also be called "units". When the meaning is clear, this book does not make a strict distinction between these two terms.

summary

  • The core calculation of the two-dimensional convolutional layer is a two-dimensional cross-correlation operation. In the simplest form, it performs a cross-correlation operation on the two-dimensional input data and the convolution kernel and then adds a bias.
  • We can design a convolution kernel to detect edges in an image.
  • We can learn the convolution kernel through data.

Guess you like

Origin blog.csdn.net/dujuancao11/article/details/108485040