"Hands-on Deep Learning Pytorch Edition" 6.3 Padding and Stride

6.3.1 Filling

Although the convolution kernel we use is small and only a few pixels are lost each time, if multiple consecutive convolution layers are applied, the accumulated pixel loss will be a lot. The solution to this problem is padding.

The filled output shape will be ( nh − kh + ph + 1 ) × ( nw − kw + pw + 1 ) (n_h-k_h+p_h+1)\times(n_w-k_w+p_w+1)(nhkh+ph+1)×(nwkw+pw+1)

import torch
from torch import nn

def comp_conv2d(conv2d, X):
    X = X.reshape((1, 1) + X.shape)  # 批量大小和通道数都是1(此处为元组拼接)
    Y = conv2d(X)
    return Y.reshape(Y.shape[2:])  # 再忽略前两个维度

conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1)
X = torch.rand(size=(8, 8))
comp_conv2d(conv2d, X).shape  # 如果无填充则是 (6,6)
torch.Size([8, 8])
conv2d = nn.Conv2d(1, 1, kernel_size=(5, 3), padding=(2, 1))  # 也可以填充不同的高度和宽度
comp_conv2d(conv2d, X).shape  # 若不填充则为 (4, 6)
torch.Size([8, 8])

6.3.2 Stride

conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1, stride=2)  # 步幅为2则宽度高度减半
comp_conv2d(conv2d, X).shape
torch.Size([4, 4])

The shape of the output after introducing the stride should be ⌊ ( nh − kh + ph + sh ) / sh ⌋ × ⌊ ( nw − kw + pw + sw ) / sw ⌋ \left\lfloor(n_h-k_h+p_h+s_h)/ s_h\right\rfloor\times\left\lfloor(n_w-k_w+p_w+s_w)/s_w\right\rfloor(nhkh+ph+sh)/sh×(nwkw+pw+sw)/sw

conv2d = nn.Conv2d(1, 1, kernel_size=(3, 5), padding=(0, 1), stride=(3, 4))  # 实际实践中很少使用不一致的步幅或填充
comp_conv2d(conv2d, X).shape
torch.Size([2, 2])

practise

(1) For the last example in this section, calculate its output shape to see if it is consistent with the experimental results.

( 高度 × 宽度 ) = ⌊ ( n h − k h + p h + s h ) / s h ⌋ × ⌊ ( n w − k w + p w + s w ) / s w ⌋ = ⌊ ( 8 − 3 + 0 + 3 ) / 3 ⌋ × ⌊ ( 8 − 5 + 1 + 4 ) / 4 ⌋ = ⌊ 8 / 3 ⌋ × ⌊ 8 / 4 ⌋ = ( 2 × 2 ) \begin{align} \left(高度\times宽度\right)&=\left\lfloor(n_h-k_h+p_h+s_h)/s_h\right\rfloor\times\left\lfloor(n_w-k_w+p_w+s_w)/s_w\right\rfloor\\ &= \left\lfloor(8-3+0+3)/3\right\rfloor\times\left\lfloor(8-5+1+4)/4\right\rfloor\\ &= \left\lfloor8/3\right\rfloor\times\left\lfloor8/4\right\rfloor\\ &= (2\times2) \end{align} ( height×width )=(nhkh+ph+sh)/sh×(nwkw+pw+sw)/sw=(83+0+3)/3×(85+1+4)/4=8/3×8/4=(2×2)


(2) For the experiments in this section, try other padding and stride combinations.

slightly


(3) For audio signals, what does the stride of 2 mean?

Sampling the signal is similar to what we learned in the group.


(4) What is the computational advantage of a stride greater than 1?

Reduce the amount of calculation.

Guess you like

Origin blog.csdn.net/qq_43941037/article/details/132953538
Recommended