[Deep Learning Knowledge] Zero Padding Operation in Convolution


1. Convolution input and output size formula

insert image description here
This is a relatively complete output size formula, taking into account stride, padding, dilation, the brackets here indicate rounding down, which is actually the remaining part of the convolution image is smaller than the convolution kernel, so The result of this convolution is discarded.
In fact, if you look at it from another perspective, you can think of dilated convolution as changing the value of the convolution kernel size K, K -> d × (K-1) +1, so the formula can be expressed more concisely as
Out = floor((In + 2P − K)/S+1)

1.1 tensorflow version

The padding of the tensorflow version is performed by directly selecting the mode parameter, and you can choose 'SAME', 'VALID'. The former uses padding to fill in zeros at the front, back, left, and right, so that the output size remains unchanged (or is reduced by a multiple of the step size), which is very commonly used , the latter does not perform padding, which should actually be equivalent to the situation where the general formula above is P=0.
SAME: Out = ceil(In/S)
VALID:Out = ceil((In − K + 1)/S)

1.2 pytorch version

>>> m = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
>>> input = torch.randn(20,3,24,24)
>>> m(input).shape
torch.Size([20, 64, 12, 12])
>>> input = torch.randn(20,3,25,25)
>>> m(input).shape
torch.Size([20, 64, 13, 13])
>>> input = torch.randn(20,3,24,24)
>>> m = nn.Conv2d(3, 64, kernel_size=6, stride=2, padding=3)
>>> m(input).shape
torch.Size([20, 64, 13, 13])
>>> m = nn.Conv2d(3, 64, kernel_size=6, stride=2, padding=2)
>>> m(input).shape
torch.Size([20, 64, 12, 12])

>>> input = torch.randn(20,3,25,25)
>>> m = nn.Conv2d(3, 64, kernel_size=6, stride=2, padding=3)
>>> m(input).shape
torch.Size([20, 64, 13, 13])
>>> m = nn.Conv2d(3, 64, kernel_size=6, stride=2, padding=2)
>>> m(input).shape
torch.Size([20, 64, 12, 12])

Here it can be observed:

  • if need to keep
  • For odd convolution kernels, the effect of SAME can be achieved by making padding=(k-1)/2. That is, the output size remains the same or is reduced by a multiple of the step size (for odd input sizes, it is rounded up)
  • For an even convolution kernel, if the input size is even, then padding=floor((k-1)/2) can achieve the effect of SAME
  • For an even convolution kernel, if the input size is odd, then padding=ceil((k-1)/2) can achieve the effect of SAME

Therefore, it is recommended not to use an even number of convolution kernels. . . Then remember padding=(k-1)/2, you can realize the effect of SAME in tf. Bring padding into the general formula at the beginning, and you can get Out = floor((In − 1)/S+1), which is actually equivalent to the SAME formula. You can see the brute force verification of the code below.

>>> a = lambda x:np.ceil(x/5)
>>> b = lambda x:np.floor((x-1)/5+1)
>>> c = [np.random.randint(6,100) for i in range(10)]
>>> [a(rand)==b(rand) for rand in c]
[True, True, True, True, True, True, True, True, True, True]

2. Summary

Although I thought about this formula for a long time, it seems that the SAME function is used more in actual use, so remember the core:

  • Use an odd number of convolution kernelspadding=(k-1)/2
  • If it is a hole convolution, then substituteK -> d × (K-1) +1

Guess you like

Origin blog.csdn.net/tobefans/article/details/125428881