Deconvolution, UNSampling, and UnPooling add your own thinking (pytorch function) (3)

ps: Recently, when I was doing segmentation and looking for code on github to look at the model, I always found that the scale from small to large was part of the scale. Some use upsampling (bilinear interpolation) + convolution, and some use deconvolution. Why is it not the same? I checked the relevant information and found that this Zhihu master summarized the reason according to the article of the big guy on the Internet. Know the address: https://www.zhihu.com/question/328891283 . Let go of the pictures that I will put in the previous one or two:

This time we focus on the comparison of the results of down deconvolution and upsampling + convolution. Borrow the above sample code of Zhihu Dashen:

import mxnet as mx

batch_size = 1
in_channel = 1
height = 2
width = 2

data_shape = (batch_size, in_channel, height, width)
data = mx.nd.ones(data_shape)
print(data)
out_channel = 1
kernel_size = 2
deconv_weight_shape = (in_channel, out_channel, kernel_size, kernel_size)
deconv_weight = mx.nd.ones(deconv_weight_shape)

stride = 2
up_scale = 2
data_deconv = mx.nd.Deconvolution(data=data, weight=deconv_weight,

				      kernel=(kernel_size, kernel_size),
				      stride=(stride, stride),
				      num_filter=out_channel)
print(data_deconv)

data_upsample = mx.nd.contrib.BilinearResize2D(data=data, scale_height=up_scale, scale_width=up_scale)
print(data_upsample)
conv_weight_shape = (out_channel, in_channel, kernel_size, kernel_size)
conv_weight = mx.nd.ones(conv_weight_shape)
pad = (kernel_size - 1) / 2
data_conv = mx.nd.Convolution(data=data_upsample, weight=conv_weight,
			            kernel=(kernel_size, kernel_size),
			            pad=(pad, pad), num_filter=out_channel, no_bias=True)
print(data_conv)

Here the size of the convolution kernel is 2*2, and the weights inside are all 1. The input raw data is a 1*1*2*2 matrix with all 1 values. The step size of deconvolution is 2. The following are the original data, original data + deconvolution, original data + 2 times bilinear interpolation, and original data + 2 times bilinear interpolation + convolution.


[[[[1. 1.]
   [1. 1.]]]]
<NDArray 1x1x2x2 @cpu(0)>

[[[[1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]]]]
<NDArray 1x1x4x4 @cpu(0)>

[[[[1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]]]]
<NDArray 1x1x4x4 @cpu(0)>

[[[[4. 4. 4.]
   [4. 4. 4.]
   [4. 4. 4.]]]]
<NDArray 1x1x3x3 @cpu(0)>

I don't seem to see anything. Change the size of the convolution kernel to 3*3, and change the original data to 1*1*4*4.

[[[[1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]]]]
<NDArray 1x1x4x4 @cpu(0)>

[[[[1. 1. 2. 1. 2. 1. 2. 1. 1.]
   [1. 1. 2. 1. 2. 1. 2. 1. 1.]
   [2. 2. 4. 2. 4. 2. 4. 2. 2.]
   [1. 1. 2. 1. 2. 1. 2. 1. 1.]
   [2. 2. 4. 2. 4. 2. 4. 2. 2.]
   [1. 1. 2. 1. 2. 1. 2. 1. 1.]
   [2. 2. 4. 2. 4. 2. 4. 2. 2.]
   [1. 1. 2. 1. 2. 1. 2. 1. 1.]
   [1. 1. 2. 1. 2. 1. 2. 1. 1.]]]]
<NDArray 1x1x9x9 @cpu(0)>

[[[[1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]]]]
<NDArray 1x1x8x8 @cpu(0)>

[[[[4. 6. 6. 6. 6. 6. 6. 4.]
   [6. 9. 9. 9. 9. 9. 9. 6.]
   [6. 9. 9. 9. 9. 9. 9. 6.]
   [6. 9. 9. 9. 9. 9. 9. 6.]
   [6. 9. 9. 9. 9. 9. 9. 6.]
   [6. 9. 9. 9. 9. 9. 9. 6.]
   [6. 9. 9. 9. 9. 9. 9. 6.]
   [4. 6. 6. 6. 6. 6. 6. 4.]]]]
<NDArray 1x1x8x8 @cpu(0)>

Under the comparison, the size of the convolution kernel is changed to 4*4.


[[[[1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]]]]
<NDArray 1x1x4x4 @cpu(0)>

[[[[1. 1. 2. 2. 2. 2. 2. 2. 1. 1.]
   [1. 1. 2. 2. 2. 2. 2. 2. 1. 1.]
   [2. 2. 4. 4. 4. 4. 4. 4. 2. 2.]
   [2. 2. 4. 4. 4. 4. 4. 4. 2. 2.]
   [2. 2. 4. 4. 4. 4. 4. 4. 2. 2.]
   [2. 2. 4. 4. 4. 4. 4. 4. 2. 2.]
   [2. 2. 4. 4. 4. 4. 4. 4. 2. 2.]
   [2. 2. 4. 4. 4. 4. 4. 4. 2. 2.]
   [1. 1. 2. 2. 2. 2. 2. 2. 1. 1.]
   [1. 1. 2. 2. 2. 2. 2. 2. 1. 1.]]]]
<NDArray 1x1x10x10 @cpu(0)>

[[[[1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]
   [1. 1. 1. 1. 1. 1. 1. 1.]]]]
<NDArray 1x1x8x8 @cpu(0)>

[[[[ 9. 12. 12. 12. 12. 12.  9.]
   [12. 16. 16. 16. 16. 16. 12.]
   [12. 16. 16. 16. 16. 16. 12.]
   [12. 16. 16. 16. 16. 16. 12.]
   [12. 16. 16. 16. 16. 16. 12.]
   [12. 16. 16. 16. 16. 16. 12.]
   [ 9. 12. 12. 12. 12. 12.  9.]]]]
<NDArray 1x1x7x7 @cpu(0)>

In this way, comparing 2 and 3, it is found that if the step size is 2, the deconvolution of the convolution kernel is 3 to produce a checkerboard-shaped 4 (feature), which is a feature that is not included in all the values ​​in the original data. (In this way, it becomes noise), and when the convolution kernel is 4, there is no such undesirable feature (noise) after deconvolution. Upsampling + convolution does not have this problem. Therefore, it is easier to use upsampling + convolution directly, while direct deconvolution requires attention to avoid this problem when designing.

As for why I want to put the first situation in this case, it is to solve my previous confusion. In fact, the step size is 2, and it is not the step size of the convolution kernel moving during deconvolution (the convolution kernel moves during deconvolution The step size is always 1), but represents the distance between the original data elements after interpolation. For example, in the figure below, the left is convolution and the right is deconvolution. The output of the left picture is first interpolated with a step length of 2 (usually interpolated with a value of 0), then the distance between elements is a step length of 2. The calculation of convolution is actually performed for a moving step of 1.

Finally, let go of the blog post of the boss: https://distill.pub/2016/deconv-checkerboard/

I have been doing it for 16 years clearly, and I only figured it out now! ! !

Guess you like

Origin blog.csdn.net/qq_36401512/article/details/103294062