Popular and detailed analysis of deconvolution and explanation of important parameters of nn.ConvTranspose2d

The role of deconvolution

Traditional convolution usually convolves a large image into a small image, and deconvolution is the reverse, turning a small image into a large image .

But what's the use? Actually useful, for example, in the generation network (GAN), we give the network a vector, and then generate a picture

insert image description here

So we need to find a way to expand this vector all the time, and finally expand it to the size of the picture.


Several concepts of padding in convolution

Before understanding deconvolution, let's learn several padding concepts of traditional convolution, because deconvolution has the same concept later

No Padding


No Padding means that the padding is 0, so that the size of the image will be reduced after convolution , which everyone should know

The pictures below are the input pictures in blue and the output pictures in green.

Half(Same) Padding

insert image description here
Half Padding is also called Same Padding . Let’s talk about Same first . Same means that the size of the output image is the same as the input image . When the stride is 1, if you want the input and output sizes to be the same, you need to specify p = ⌊ k / 2 ⌋ p=\lfloor k/2 \rfloorp=k / 2 , this isHalf, that is, the padding number is half of the kerner_size.

Same padding is supported in pytorch, for example:

inputs = torch.rand(1, 3, 32, 32)
outputs = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=5, padding='same')(inputs)
outputs.size()
torch.Size([1, 3, 32, 32])

Full Padding

insert image description here

When p = k − 1 p=k-1p=kFull Padding is reached at 1 . Why do you say that? You can observe the above figure,k = 3 k=3k=3 p = 2 p=2 p=2. At this time, only one input unit participates in the convolution during the first grid convolution. Supposep = 3 p=3p=3 , then there will be some convolution operations that do not involve input units at all, and eventually result in a value of 0, which is the same as not doing it.

We can use pytorch to do a verification, first we have a Full Padding:

inputs = torch.rand(1, 1, 2, 2)
outputs = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=2, bias=False)(inputs)
outputs
tensor([[[[-0.0302, -0.0356, -0.0145, -0.0203],
          [-0.0515, -0.2749, -0.0265, -0.1281],
          [ 0.0076, -0.1857, -0.1314, -0.0838],
          [ 0.0187,  0.2207,  0.1328, -0.2150]]]],
       grad_fn=<SlowConv2DBackward0>)

It can be seen that the output at this time is normal, and we will increase the padding to 3:

inputs = torch.rand(1, 1, 2, 2)
outputs = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=3, bias=False)(inputs)
outputs
tensor([[[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.1262,  0.2506,  0.1761,  0.3091,  0.0000],
          [ 0.0000,  0.3192,  0.6019,  0.5570,  0.3143,  0.0000],
          [ 0.0000,  0.1465,  0.0853, -0.1829, -0.1264,  0.0000],
          [ 0.0000, -0.0703, -0.2774, -0.3261, -0.1201,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]],
       grad_fn=<SlowConv2DBackward0>)

It can be seen that there is an extra circle of 0 around the final output image, which means that some convolutions do not involve input images, resulting in invalid calculations.


deconvolution

Deconvolution is actually the same as convolution, except that the parameter correspondence has changed a bit. For example:

insert image description here
This is a deconvolution with padding=0. At this time, you must ask. The padding is clearly 2. How do you say it is 0? please look below

Padding parameter in deconvolution

In traditional convolution, our padding range is [ 0 , k − 1 ] [0, k-1][0,k1] p = 0 p=0 p=0 is called No padding,p = k − 1 p=k-1p=k1 is called Full Padding.

And p ' p' in deconvolutionp is just the opposite, that is,p ′ = k − 1 − p p' = k-1 - pp=k1p . That is, when we passp ′ = 0 p'=0p=0 , it is equivalent to passingp = k − 1 p=k-1p=k1 , and passp ′ = k − 1 p'=k-1p=kWhen 1 , it is equivalent to passingp = 0 p=0p=0

We can use the following experiments to verify:

inputs = torch.rand(1, 1, 32, 32)
# 定义反卷积,这里 p'=2, 为反卷积中的Full Padding
transposed_conv = nn.ConvTranspose2d(in_channels=1, out_channels=1, kernel_size=3, padding=2, bias=False)
# 定义卷积,这里p=0,为卷积中的No Padding
conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=0, bias=False)
# 让反卷积与卷积kernel参数保持一致,这里其实是将卷积核参数的转置赋给了反卷积
transposed_conv.load_state_dict(OrderedDict([('weight', torch.Tensor(np.array(conv.state_dict().get('weight'))[:, :, ::-1, ::-1].copy()))]))
# 进行前向传递
transposed_conv_outputs = transposed_conv(inputs)
conv_outputs = conv(inputs)

# 打印卷积输出和反卷积输出的size
print("transposed_conv_outputs.size", transposed_conv_outputs.size())
print("conv_outputs.size", conv_outputs.size())

# 查看它们输出的值是否一致。
#(因为上面将参数转为numpy,又转了回来,所以其实卷积和反卷积的参数是有误差的,
# 所以不能直接使用==,采用了这种方式,其实等价于==)
(transposed_conv_outputs - conv_outputs) < 0.01
transposed_conv_outputs.size:  torch.Size([1, 1, 30, 30])
conv_outputs.size:  torch.Size([1, 1, 30, 30])

tensor([[[[True, True, True, True, True, True, True, True, True, True, True,
		 .... //略

As can be seen from the above example, deconvolution and convolution are actually the same, the difference is just a few points:

  1. When deconvolution performs convolution, the parameter used is the transpose of the kernel , but we don't need to care about this
  2. Deconvolution padding parameter p ′ p'p and the parameterppThe corresponding relation of p is p ′ = k − 1 − p p'=k-1-pp=k1p . In other words, no padding in convolution corresponds to full padding in deconvolution; full padding in convolution corresponds to no padding in deconvolution.
  3. You can also see one thing from 2, p ′ p' in deconvolutionp cannot be infinitely large, the maximum value isk − 1 − p k-1-pk1p . (actually not oh)

Off topic, if you are not interested, you can skip it. In the third point above, we said p ′ p'p has a maximum value ofk − 1 − p k-1-pk1p , but actually you will find out with pytorch experiment,p ′ p'p' can be greater than this value. Behind this, it is equivalent tocropping the original image.

In pytorch's nn.Conv2d, padding cannot be negative, and an error will be reported, but sometimes you may need to make padding negative (should there be no such requirement), and you can use deconvolution to achieve this, for example:

inputs = torch.ones(1, 1, 3, 3)
transposed_conv = nn.ConvTranspose2d(in_channels=1, out_channels=1, kernel_size=1, padding=1, bias=False)
print(transposed_conv.state_dict())
outputs = transposed_conv(inputs)
print(outputs)
OrderedDict([('weight', tensor([[[[0.7700]]]]))])
tensor([[[[0.7700]]]], grad_fn=<SlowConvTranspose2DBackward0>)

In the above example, what we pass to the network is an image:

[ 1 1 1 1 1 1 1 1 1 ] \begin{bmatrix} 1 & 1 &1 \\ 1 & 1 &1 \\ 1 & 1 &1 \end{bmatrix} 111111111

But we passed p ′ = 1 , k = 1 p'=1, k=1p=1,k=1 , which is equivalent to p = k − 1 − p ′ = − 1 p=k-1-p'=-1in traditional convolutionp=k1p=1 , equivalent toConv2d(padding=-1), when doing convolution, it is actually for the picture[ 1 ] [1][ 1 ] is doing convolution (because the surrounding area is cut), so the final output size is( 1 , 1 , 1 , 1 ) (1,1,1,1)(1,1,1,1)

This digression seems to have no practical use, so it should be a better understanding of the padding parameters in deconvolution.



The stride parameter of deconvolution

The name of the deconvolution stride is somewhat ambiguous, and it doesn’t feel very good. You can see the following picture for the specific meaning:

insert image description hereinsert image description here

On the left is the deconvolution with stride=1 (called No Stride), and on the right is the deconvolution with stride=2. It can be seen that the difference between them is that 0 is filled in the middle of the pixels of the original image. That's right, in deconvolution, the stride parameter means filling 0 between every two pixels of the input image, and the amount of filling is stride - 1 .

For example, if we deconvolute a 32x32 image, stride=3, then it will fill two 0s between every two pixels, and the size of the original image will become 32 + 31 × 2 = 94 32+31 \times 2=9432+31×2=9 4 . Experiment with code:

inputs = torch.ones(1, 1, 32, 32)
transposed_conv = nn.ConvTranspose2d(in_channels=1, out_channels=1, kernel_size=3, padding=2, stride=3, bias=False)
outputs = transposed_conv(inputs)
print(outputs.size())
torch.Size([1, 1, 92, 92])

Let's do the math, here I use deconvolution's Full Padding (equivalent to not padding the edge of the original image), and then stride passes 3, which is equivalent to filling two 0s between every two pixels, then The original image will become 94x94, and then the kernal is 3, so the final output image size is 94 − 3 + 1 = 92 94-3+1=92943+1=92.


The output_padding parameter of deconvolution

I don’t know if you have discovered that if the parameters of convolution and deconvolution are the same, convolution will make AAA size becomesBBB size, then deconvolution willBBB size becomesAAA size.

for example:

inputs = torch.rand(1, 1, 32, 32)
outputs = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=18, padding=3, stride=1)(inputs)
outputs.size()
torch.Size([1, 1, 21, 21])

Here we have changed the 32x32 image to 21x21 through convolution. At this point we change the convolution to deconvolution (the parameters remain unchanged), and the input image size becomes 21x21:

inputs = torch.rand(1, 1, 21, 21)
outputs = nn.ConvTranspose2d(in_channels=1, out_channels=1, kernel_size=18, padding=3, stride=1)(inputs)
outputs.size()
torch.Size([1, 1, 32, 32])

See, deconvolution turns the 21x21 image back into 32x32, which is why it is called deconvolution.

but. . , is really the case, let's look at another example:

inputs = torch.rand(1, 1, 7, 7)
outputs = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=0, stride=2)(inputs)
outputs.size()
torch.Size([1, 1, 3, 3])
inputs = torch.rand(1, 1, 8, 8)
outputs = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=0, stride=2)(inputs)
outputs.size()
torch.Size([1, 1, 3, 3])
inputs = torch.rand(1, 1, 3, 3)
outputs = nn.ConvTranspose2d(in_channels=1, out_channels=1, kernel_size=3, padding=0, stride=2)(inputs)
outputs.size()
torch.Size([1, 1, 7, 7])

Above we use convolution operations on both 7x7 and 8x8 images, and their final results are all 3x3, so there will be ambiguity in deconvolution, and deconvolution chooses to convert to 7x7 by default. The reason can be seen in the figure below:

insert image description here

As can be seen from this picture, the rightmost and bottom lines of the 8x8 picture are actually not involved in the convolution operation. This is because the stride is 2, and it will exceed the scope of the picture after 2 steps. So both 7x7 and 8x8 end up being 3x3.

Then if we want the 3x3 deconvolution to be 8x8 instead of 7x7, then we need to supplement data at the edge of the output image, and the specific supplementary lines are specified by output_padding. So the function of output_padding is to complement the value on the right and lower sides of the output image to make up for the lack of stride greater than 1 . Where output_stadding must be less than stride

For example:

inputs = torch.rand(1, 1, 3, 3)
outputs = nn.ConvTranspose2d(in_channels=1, out_channels=1, kernel_size=3, padding=0, stride=2, output_padding=1)(inputs)
outputs

insert image description here

I don't know exactly what this 0.2199 is. I tested and found that it is not the average value.


Deconvolution Summary

  1. The function of deconvolution is to expand the original image

  2. Deconvolution is not much different from traditional convolution. The main differences are:

    2.1 The corresponding relationship of padding has changed, and the padding parameter of deconvolution p ′ = k − 1 − p p' = k-1-pp=k1p . wherekkk is kernel_size, p is the padding value of traditional convolution;
    2.2The meaning of the stride parameter is different. In deconvolution, stride means filling 0 in the middle of the input image, and the number of filling between every two pixels is stride-1
    2.3 Except for the above two parameters, there is no difference in other parameters.

  3. If the parameters of convolution and deconvolution are the same, convolution will change AA size to BB size, then deconvolution will change BB size to AA size

  4. The function of output_padding is to supplement the value on the right and lower sides of the output image to make up for the lack of stride greater than 1 . Where output_stadding must be less than stride





References

Convolution arithmetic: https://github.com/vdumoulin/conv_arithmetic

A guide to convolution arithmetic for deep
learning
: https://arxiv.org/pdf/1603.07285.pdf

nn.ConvTranspose2d official documentation : https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

What output_padding does in nn.ConvTranspose2d?:https://stackoverflow.com/questions/67096544/what-output-padding-does-in-nn-convtranspose2d

Guess you like

Origin blog.csdn.net/zhaohongfei_358/article/details/125639916