Padding and stride

Part of the study notes of "Practice Deep Learning pytorch" is only for your own review.

Padding and stride

filling 

Padding refers to padding elements (usually 0 elements) on both sides of the input height and width. Figure 5.2. We added elements with a value of 0 on both sides of the original input height and width, so that the input height and width changed from 3 to 了5, and the output height and width increased from 2 to 4 .

 Convolutional neural networks often use odd-numbered convolution kernels , such as 1, 3, 5, and 7, so the number of padding on both ends is equal. For any two-dimensional array X, let its i行jth element be X[i,j]. When the number of padding on both ends is equal, and the input and output have the same height and width, we know that the output Y[i,j] is the same volume by the input window centered on X[i,j] The product is calculated by cross-correlation.
In the following example, we create a two-dimensional convolutional layer with a height and width of 3, and then set the padding numbers on both sides of the input height and width to 1 respectively. Given an input with a height and width of 8, we find that the height and width of the output are also 8.

import torch
from torch import nn
# 定义⼀个函数来计算卷积层。它对输⼊和输出做相应的升维和降维
def comp_conv2d(conv2d, X):
    # (1, 1)代表批量⼤小和通道数均为1
    X = X.view((1, 1) + X.shape)
    Y = conv2d(X)
    return Y.view(Y.shape[2:]) # 排除不关心的前两维:批量和通道
    # 注意这⾥是两侧分别填充1⾏或列,所以在两侧一共填充2⾏或列
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3,padding=1)
X = torch.rand(8, 8)
comp_conv2d(conv2d, X).shape

Output: torch.Size([8, 8]) 

When the height and width of the convolution kernel are different, we can also set the same fill number on the height and width to make the output and input have the same height and width.

# 使⽤高为5、宽为3的卷积核。在高和宽两侧的填充数分别为2和1
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(5,3), padding=(2, 1))
comp_conv2d(conv2d, X).shape

Output: torch.Size([8, 8]) 

 Stride

The convolution window starts from the top left of the input array and slides on the input array in order from left to right and top to bottom. We will each slide number ⾏ and columns called stride (stride).

Figure 5.3 shows a two -dimensional cross-correlation operation with a stride of 3 on the height and 2 stride on the width . It can be seen that when outputting the second element in the first column, the convolution window slides down by 3 to 行, while when outputting the first and second elements, the convolution window slides to the right by 2 columns. When the convolution window slides 2列 to the right on the input, because the input elements cannot fill the window, no result is output.

Below we set the stride on both the height and width to 2 to reduce the input height and width by half.

conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1, stride=2)
comp_conv2d(conv2d, X).shape

Output: torch.Size([4, 4]) 

Next is a slightly more complicated example.

conv2d = nn.Conv2d(1, 1, kernel_size=(3, 5), padding=(0, 1), stride=(3, 4))
comp_conv2d(conv2d, X).shape

Output: torch.Size([2, 2]) 

summary

  • Padding can increase the height and width of the output. This is often used to make the output have the same height and width as the input.
  • The stride can reduce the output height and width, for example, the output height and width are only 1/n of the input height and width (n is an integer greater than 1).

Guess you like

Origin blog.csdn.net/dujuancao11/article/details/108490243