Deconvolution understanding and derivation

Reference How to explain deconvolution in an easy-to-understand manner? - Zhihu,[Basic knowledge learning] Convolution and deconvolution study notes - Zhihu

1. Concept

Deconvolution is a special kind of forward convolution. It first expands the size of the input image by padding 0 according to a certain proportion, then rotates the convolution kernel, and then performs forward convolution.

Figure 1 Deconvolution principle diagram (stride=1)

 Figure 1 Deconvolution principle diagram (stride=2)

2. Mathematical derivation

Assume that the input image size is 4×4, and the element matrix is:

The convolution kernel size is 3×3, and the element matrix is:

strides = 1, padding = 0, that is, i = 4, k = 3, s = 1, p = 0. According to the convolution formula,  , output the image The size of output is 2×2.

in,\small y1=w_{0,0}x_{1}+w_{0,1}x_{2}+w_{0,2}x_{3}+w_{1,0}x_{5}+w_{1,1}x_{6}+w_{1,2}x_{7}+w_{2,0}x_{9}+w_{2,1}x_{10}+w_{2,2}x_{11}

Expand the element matrix of input into a column vector X:

Expand the element matrix of the output image output into a column vector Y:

For the input element matrix X and the output element matrix Y , use matrix operations to describe this process:\small Y=CX

Through derivation, we can get the sparse matrix C:

Multiply the first line of C with input to get y1; multiply the second line with input to get y2...

The operation of deconvolution is to perform the inverse operation on this matrix operation process, that is, \small X=C^{T}Y


The picture below visually reflects the entire process:

Flatten the input into a 16*1 matrix and convert the convolution kernel into a 4x16 sparse matrix. Then, perform matrix multiplication. Reshape the resulting 4x1 matrix to 2x2 output.

At this time, if the convolution kernel is used to correspond to the transpose of the sparse matrix\small C^{T} (16∗4) multiplied by the output flatten (4∗1), the result obtained is (16∗ 1) has the same shape as the input shape (16*1). 

 Note: The above two operations are not reversible. For the same convolution kernel, the transposed convolution operation cannot restore the original value, but only retains the original value. shape.

As shown in the figure above, we use a 3×3 convolution kernel to convolve a 4×4 input. After obtaining a 2×2 output, we then perform deconvolution and find that it is not the original input. In fact, it is very simple. The first number is equivalent to the sum of 9 numbers equaling 4.5. How can we solve 9 unknown numbers with one equation? Therefore, deconvolution cannot restore the original input, but can only ensure that the shape is the same.​ 

3.padding

1. Padding in convolution

①In full mode, convolution starts from the intersection point of the convolution kernel and the image:

② In same mode, convolution starts when the center point of the convolution kernel coincides with the image, also called half filling:

③Valid is the convolution where the convolution kernel is completely in the image:

2. Padding in deconvolution

One way to say it is: intuitively fill the number of layers with 0 padding.

One way to say it is: the size of the convolution image when the convolution kernel is initially used. When padding'=0, the initial convolution kernel can be convolved to the unit of the image equal to 1. Please refer to the official website ConvTranspose2d — PyTorch 1.13 documentation , see the picture to understand:

Their relationship is padding’=kernel-1-padding.

4.stride

1. The stride in convolution: the step size of the convolution kernel convolution

2. Stride in deconvolution: the distance between each pixel of the image, that is, adding stride-1 elements 0 between the pixels.

5. Output image size formula

1. Output image size of convolution

Take the above figure as an example, assuming input=6, kernel=3, stride=2, padding=1, then output=[(6+2-3)/2+1]=[3.5]=3

2. Output image size of deconvolution

Note: The padding in this formula is the number of layers filled with 0 in the outer layer.

In deconvolution, stride is to add stride-1 0 elements between adjacent elements. After transformation, the actual input size

Assume that the input image size is 3x3, and other parameters are the same as before: kernel=3, padding=1, and stride=2.

Note: The stride of deconvolution is used to expand the input image, not the step size of the kernel movement, so The kernel step size in deconvolution is always 1.

Calculated, o=[(3+1*2+2-3)/1+1]=5, that is, the output size is equal to 5.


Note: If padding is defined as the size of the convolution image when the convolution kernel is initially used, the deconvolution formula should be

\small o=\frac{i_{s}+2(k-1-p)-k}{s}+1

It is equivalent to replacing p with p'=k-1-p.


3.Examples

Note: The padding in this formula is the number of layers filled with 0 in the outer layer.

1. Input size input=2, kernel_size=3, stride=1, padding=2, calculate the output size of deconvolution?

is=i+(s-1)(i-1)=i=2

output= (is+2p-k)/1+1=(2+4-3)/1+1=4


2. Input size input=3, kernel=3, stride=2, padding=1, calculate the output size of deconvolution?

is=i+(s-1)(i-1)=3+1*2=5

output= (is+2p-k)/1+1=(5+2-3)/1+1=5

 

Guess you like

Origin blog.csdn.net/qq_54867493/article/details/128444220