反卷积(Transposed Convolution, Fractionally Strided Convolution or Deconvolution)




The concept of deconvolution first appeared in Zeiler's paper Deconvolutional networks in 2010 , but the name deconvolution was not specified. The official use of the term deconvolution was in its subsequent work ( Adaptive deconvolutional networks for mid and high level feature learning ). With the successful application of deconvolution in neural network visualization, it has been adopted by more and more works such as scene segmentation, generative models, etc. Among them, Deconvolution also has many other names, such as: Transposed Convolution, Fractional Strided Convolution and so on.

The purpose of this article is mainly twofold:
1. To explain the relationship between convolutional and deconvolutional layers;
2. To clarify the relationship between the input feature size and output feature size of the deconvolutional layer.

## Convolutional layer The

convolutional layer should be familiar to everyone. For the convenience of description, the definition is as follows:
- Two-dimensional discrete convolution (N=2)
- Square feature input (i1=i2=i)
- Square volume Product kernel size (k1=k2=k)
- the same step size for each dimension (s1=s2=s)
- the same padding for each dimension (p1=p2=p)

The following figure shows that the parameters are (i=5,k =3,s=2,p=1) The convolution calculation process, from the calculation results, it can be seen that the size of the output feature is (o1=o2=o=3).


The figure below shows the convolution calculation process with parameters (i=6,k=3,s=2,p=1). From the calculation results, it can be seen that the size of the output feature is (o1=o2=o=3).


From the above two examples, we can conclude that the relationship between the input feature and output feature size of the convolution layer and the convolution kernel parameters is:
o=⌊i+2p−ks⌋+1.


where ⌊x⌋ means to round down x.

Deconvolution layer

Before introducing deconvolution, let's take a look at the relationship between convolution operations and matrix operations.

Convolution and matrix multiplication

Consider the following simple convolutional layer operation with parameters (i=4,k=3,s=1,p=0) and output o=2.


For the above convolution operation, we expand the 3×3 convolution kernel shown in the figure above into a sparse matrix C of [4, 16] as shown below, where the non-zero elements wi,j represent the i-th convolution kernel. row and column j.

(w0,0w0,1w0,20w1,0w1,1w1,20w2,0w2,1w2,2000000w0,0w0,1w0,20w1,0w1,1w1,20w2,0w2,1w2,200000000w0,0w0,1w0,20w1,0w1,1w1,20w2,0w2,1w2,2000000w0,0w0,1w0,20w1,0w1,1w1,20w2,0w2,1w2,2)

We then expand the 4×4 input feature into a matrix X of [16,1], then Y=CX is an output feature matrix of [4,1], and rearrange it to a 2×2 output feature to get the final From the above analysis, it can be seen that the calculation of the convolutional layer can actually be converted into matrix multiplication. It is worth noting that in some open-source frameworks of deep learning networks, convolution is not calculated by this conversion method, because there are many useless 0-multiplication operations in this conversion. The specific method of convolution calculation in Caffe can be See Implementing convolution as a matrix multiplication .

Through the above analysis, we already know that the forward operation of the convolutional layer can be expressed as multiplication with the matrix C, then  we can easily get that the backpropagation of the convolutional layer is multiplied by the transpose of C.

The relationship between deconvolution and convolution

We have already said that deconvolution is also called Transposed Convolution. We can see that the forward propagation process of the convolution layer is actually the back propagation process of the deconvolution layer, and the back propagation of the convolution layer. The process is the forward propagation process of the deconvolution layer. Because the forward and backward computations of the convolutional layer are multiplied by C and CT, respectively, and the forward and backward computations of the deconvolutional layer are multiplied by CT and (CT)T, respectively, their forward propagation and back propagation are just swapped come over.

The figure below shows a deconvolution operation corresponding to the convolution calculation in the above figure, where their input and output relationships are just opposite. If the deconvolution operation is calculated by the reverse operation of the convolution operation without considering the channel, we can also use the discrete convolution method to find the deconvolution (this is just for illustration, it will not be done in actual work).

Also for illustration, the parameters of the deconvolution operation are defined as follows:

  • Two-dimensional discrete convolution (N=2)
  • Square feature input (i′1=i′2=i′)
  • Square convolution kernel size (k′1=k′2=k′)
  • The same step size for each dimension (s′1=s′2=s′)
  • Same padding for each dimension (p′1=p′2=p′)

The following figure shows the deconvolution operation with parameters ( i'=2,k'=3,s'=1,p'=2), and the corresponding convolution operation parameters are (i=4,k=3 ,s=1,p=0). We can find that the corresponding convolution and non-convolution operations are (k=k',s=s'), but the deconvolution has more p'=2. By comparison, we can find that the input in the upper left corner of the convolutional layer only contributes to the output in the upper left corner, so the deconvolution layer will appear p′=k−p−1=2. From the schematic diagram, we can find that the relationship between the input and output of the deconvolution layer in the case of s=s′=1 is:

o′=i′−k′+2p′+1=i′+(k−1)−2p


Fractionally Strided Convolution

It was also mentioned above that deconvolution is sometimes called Fractionally Strided Convolution, which roughly means fractional stride convolution. For a convolution with stride s>1, we might think of its corresponding deconvolution stride s'<1. The following figure shows the deconvolution operation corresponding to a convolution operation with parameters i=5, k=3, s=2, p=1 (as demonstrated in the first figure). For the fractional step size of the deconvolution operation, we can understand it as: insert s-1 0s between its input feature units, and after inserting 0, it is seen as a new feature input, and then the step size s' is no longer is a decimal but is 1. Therefore, combined with the conclusions obtained above, we can conclude that the input-output relationship of Fractionally Strided Convolution is:

o′=s(i′−1)+k−2p


refer to

conv_arithmetic

Is the deconvolution layer the same as a convolutional layer?

 

Reprinted to: http://buptldy.github.io/2016/10/29/2016-10-29-deconv/

More visual understanding of convolution and deconvolution: https://github.com/vdumoulin/conv_arithmetic

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325163620&siteId=291194637