[Machine Learning] Detailed Transpose Convolution (Transpose Convolution)

Table of contents

1. The background of transposed convolution

Second, the application of transposed convolution

3. The difference between transposed convolution

4. Derivation of transposed convolution

5. Output of transposed convolution

5.1 stride = 1

5.2 stride > 1 ☆

6. Summary 


1. The background of transposed convolution

        Usually, after performing multiple convolution operations on the image, the size of the feature map will continue to shrink. For some specific tasks (such as image segmentation and image generation, etc.), it is necessary to restore the image to its original size before operating. This size restoration operation, which maps the image from a small resolution to a large resolution, is called  Upsampling (Upsample) , as shown in the following figure:

Figure 1 Upsampling example

        There are many upsampling methods, see " [Image Processing] Detailed Explanation of Nearest Neighbor Interpolation, Linear Interpolation, Bilinear Interpolation, Bicubic Interpolation_Wen Shao-CSDN Blog " for details. However, these upsampling methods are designed based on people's prior experience, and the effect is not ideal in many scenarios ( such as  fixed rules and non-learnable ). Therefore, we want the neural network to learn how to interpolate better by itself, which is  the transposed convolution that will be introduced next .


Second, the application of transposed convolution

        Once upon a time, transposed convolution was also called  deconvolution (Deconvolution) . Compared with the traditional upsampling method, the upsampling method of the transposed convolution is not a preset interpolation method, but has learnable parameters like the standard convolution, and the optimal upsampling method can be obtained through network learning. .

        Transposed convolution has a wide range of applications in certain specific fields, such as:  

  • In DCGAN, the generator transforms random values ​​into a full-size image, which requires transposed convolution.
  • In semantic segmentation, convolutional layers are used in the encoder to extract features, and then the original size is restored in the decoder to classify each pixel in the original image. This process also requires transposed convolution. The classic methods are FCN and U-Net.
  • CNN Visualization: Restore CNN feature maps to pixel space via transposed convolution to observe which patterns of images a particular feature map is sensitive to.

3. The difference between transposed convolution

        The operation of standard convolution is actually to multiply and sum the elements in the convolution kernel and the elements at the corresponding positions on the input matrix pixel by pixel. Then, the convolution kernel slides on the input matrix in units of steps until all positions of the input matrix are traversed.    

        Assume that the input is a 4×4 matrix, which is calculated using a 3×3 standard convolution with padding=0 and stride=1. The final output should be a 2×2 matrix, as shown in Figure 2:

Figure 2 Standard convolution example

        In the above example, the value of the 3×3 range in the upper right corner of the input matrix (yellow 2 3 4) will affect the value in the upper right corner of the output matrix (yellow 27), which actually corresponds to the concept of receptive field in standard convolution. Therefore, it can be said that the 3×3 standard convolution kernel  establishes a mapping relationship     from 9 values ​​in the input matrix to 1 value in the output matrix.

        To sum up, we can also think that the standard convolution operation actually establishes a  many-to-one mapping relationship .    

        For transposed convolution, we actually want to establish a reverse operation, that is,  a one-to-many mapping relationship . For the above example, what we want to establish is actually the relationship between 1 value in the output matrix and 9 values ​​in the input matrix, as shown in Figure 3:

Figure 3 Example of convolution reverse operation

         Of course, from the perspective of information theory, the conventional convolution operation is irreversible, so the transposed convolution does not calculate the original input matrix through the output matrix and the convolution kernel, but calculates the matrix that maintains the relative positional relationship .


4. Derivation of transposed convolution

        Define a 4×4 input matrix input :

        Then define a 3×3 standard convolution kernel kernel :

        Set stride=1, padding=0, then press "valid" convolution mode to get 2×2 output matrix output :

        Here, to change the expression, expand the input matrix input and the output matrix output into a 16×1 column vector X and a 4×1 column vector Y , which can be expressed as:

        Then, matrix operations are used to describe the standard convolution operation, and  a new convolution kernel matrix C is set :

        After derivation (convolution operation relationship), a 4×16 sparse matrix C can be obtained :

         Below, use Figure 4 to show the matrix operation process:

Figure 4 Example of standard convolution matrix operations

        The transposed convolution is actually to invert this process, that is,  to get X through C and Y :

        At this point, C^Tit is a new 16×4 sparse matrix . The following figure 5 shows the convolution matrix operation after transposition. Here, the weight matrix used for the transposed convolution C^T  does not necessarily come from the original convolution matrix C (usually not so coincidentally), but its shape C is the same as the transpose of the original convolution matrix.

Figure 5 Example of convolution matrix operation after transposition

        Finally, by reordering the 16×1 output results, a 4×4 output matrix can be obtained from a 2×2 input matrix.


5. Output of transposed convolution

5.1 stride = 1

Similarly, using the 3×3 convolution kernel matrix C         above :

        The output matrix output is still:

        Expand the output matrix into  a column vector Y :

        Introduced into the transposed convolution calculation formula above, the calculation result of the transposed convolution is:

        This is actually equivalent to  first filling the input matrix input with padding=2 :

        Then, transpose the standard convolution kernel kernel :

        Finally, the  standard convolution operation is performed  on the zero-filled input matrix input using  the transposed standard convolution kernel kernel , as shown in Figure 6: 

Figure 6 Example of transposed convolution operation for s=1 (step size s'=1)

        More generally, for the convolution kernel size kernel size =  k, stride = s = 1, padding = p = 0 transposed convolution, its  equivalent standard convolution  is operated on the original size i'of the input matrix, the output feature The dimensions of the graph o'are  :

o' = (i' - 1) + k

        At the same time, the input matrix input of the equivalent standard convolution needs to be filled with padding' = before the convolution operation  k-1to obtain the size  i'' = i' + 2(k-1).

        Therefore, in fact, the original calculation formula is (the step size of the equivalent standard convolution s' = 1):

o' = \frac{i'' - k + 2p} {s'} + 1 = i' + 2(k-1) - k + 1 = (i'-1) + k


5.2 stride > 1 ☆

        In practice, most of the time we use transposed convolution with stride>1 to obtain a larger upsampling magnification.

        Below, the input size is 5×5 , the standard convolution kernel is the same as above kernel size =  k= 3 , the step size is stride = s = 2 , and padding = p = 0. After the standard convolution operation, the output matrix size is 2×2 :

        Here, the size of the converted sparse matrix becomes 25×4. Since the matrix is ​​too large, it will not be expanded and listed here. The result of the final transposed convolution is:

        At this time, it is equivalent to  adding holes and padding to the input matrix at the same time , and then the operation is performed by the standard convolution kernel that only transposes. The process is shown in Figure 7:

Figure 7 When s=2, an example of transposed convolution operation (step size s'=1)

        More generally, for the convolution kernel size kernel size =  k, step size = s > 1, padding = p = 0 transposed convolution, the  equivalent standard convolution i' is operated on the input matrix with the original size , and the output feature The dimensions of the graph o'are  :

o' = s(i' - 1) + k

        At the same time, the input matrix input of the equivalent standard convolution needs to be filled with padding' = before the convolution operation  k-1; then, the number of holes between adjacent elements is s-1, and a total i' - 1 of holes need to be inserted; thus, the actual size is  i'' = i' + 2(k-1) + (i' -1) \times (s-1) = s \times (i' - 1) + 2k - 1.

        Therefore, in fact, the original calculation formula is (the step size of the equivalent standard convolution s' = 1):

o' = \frac{i'' - k + 2p} {s'} + 1 = s(i' - 1) + 2k - 1 - k + 1 = s(i' - 1) + k

        It can be seen that  the upsampling magnification can be controlled by controlling the step size stride = s , and this parameter is analogous to  the expansion rate/hole number of expansion/ hole convolution .


6. Summary 

        Note: The actual weights in the matrix do not necessarily come from the original convolution matrix. The important thing is that the arrangement of weights is obtained by transposing the convolution matrix. The transposed convolution operation forms the same connectivity as ordinary convolution, but the direction is reversed.

        We can use transposed convolution for upsampling, and  the weights of transposed convolution are learnable , so there is no need for a predefined interpolation method.

        Even though it's called transposed convolution, it doesn't mean we take some existing convolution matrix and use the transposed version . The point is that the correlation between input and output is handled in an inverse fashion compared to standard convolution matrices (one-to-many correlation instead of many-to-one correlation)

        Therefore, a transposed convolution is not a convolution, but a convolution can be used to simulate a transposed convolution . Upsampling the input matrix by inserting zero values ​​between (and padding around) its values, followed by a regular convolution produces the same effect as a transposed convolution . You may find some articles explaining transposed convolution this way. However, it is less efficient due to the need to upsample the input before regular convolution. 

        Note: Transposed convolution will cause  grid/checkerboard artifacts to appear in the generated image , so there are many follow-up improvements to this problem.


References:

Listen to Liu Xiao Paddle talking about AI | Issue 5: A variant of convolution: transposed convolution

https://towardsdatascience.com/up-sampling-with-transposed-convolution-9ae4f2df52d0

Guess you like

Origin blog.csdn.net/qq_39478403/article/details/121181904