A simple and in-depth understanding of transposed convolution Conv2DTranspose

If you review the past and learn the new, you can become a teacher!

1. Reference materials

Paper: A guide to convolution arithmetic for deep learning
github source code: Convolution arithmetic
bilibili video: transposed convolution (transposed convolution)
transposed convolution (Transposed Convolution)
[keras/Tensorflow/pytorch] Detailed explanation of Conv2D and Conv2DTranspose in
an easy-to-understand manner Explain deconvolution?
Deconvolution operation Conv2DTranspose

2. Standard convolution (Conv2D)

Insert image description here

1. Conv2DCalculation formula

标准卷积计算公式有:
o = ⌊ i + 2 p − k s ⌋ + 1 i = size   of   input o = size   of   output p = p a d d i n g k = size   of   kernel s = s t r i d e s o=\lfloor\frac{i+2p-k}{s}\rfloor+1 \quad \begin{array}{l} \\i=\textit{size of input}\\o=\textit{size of output}\\p=padding\\k=\textit{size of kernel}\\s=strides\end{array} o=si+2pk+1i=size of inputo=size of outputp=paddingk=size of kernels=strides

Among them, ⌊ ⋅ ⌋ \lfloor\cdot\rfloor represents the rounding down symbol.

Take the height of the feature map Height as an example, after the convolution operation, the output feature map calculation formula is:
H out = H in + 2 p − ks + 1 ( 1 ) H_{out}=\frac{H_{in}+2p-k}s+1\ quad(1)Hout=sHin+2pk+1(1)

2. Conv2DThe step size in stride

2.1 When step size stride=1,p=0,k=3

Insert image description here

Input feature map (blue): ( H in , W in ) = ( 4 , 4 ) (H_{in},W_{in})=(4,4)(Hin,Win)=(4,4 ) Definitely
:kernel _ size ( k ) = 3 , stride ( s ) = 1 , padding = 0 kernel\_size(k)=3,stride(s)=1,padding=0kernel_size(k)=3,stride(s)=1padding=0 .
Output feature map (green):( H out , W out ) = ( 2 , 2 ) (H_{out},W_{out})=(2,2)(Hout,Wout)=(2,2)

Substitution formula ( 1 ) Formula (1)公式(1)中,可得:
H o u t = H i n + 2 p − K S + 1 H o u t = 4 + 2 ∗ 0 − 3 1 + 1 = 2 H_{out}=\frac{H_{in}+2p-K}S+1\\ H_{out}=\frac{4+2*0-3}1+1=2 Hout=SHin+2pK+1Hout=14+203+1=2

2.2 When step size stride=2,p=1,k=3

Insert image description here

Input feature map (blue): ( H in , W in ) = ( 5 , 5 ) (H_{in},W_{in})=(5,5)(Hin,Win)=(5,5 )
Definitely:kernel_ size ( k ) = 3 , stride ( s ) = 2 , padding = 1 kernel\_size(k)=3,stride(s)=2,padding=1kernel_size(k)=3,stride(s)=2padding=1 .
Output feature map (green):( H out , W out ) = ( 3 , 3 ) (H_{out},W_{out})=(3,3)(Hout,Wout)=(3,3)

Substitution formula ( 1 ) Formula (1)公式(1)中,可得:
H o u t = H i n + 2 p − k s + 1 H o u t = 5 + 2 ∗ 1 − 3 2 + 1 = 3 H_{out}=\frac{H_{in}+2p-k}s+1\\ H_{out}=\frac{5+2*1-3}2+1=3 Hout=sHin+2pk+1Hout=25+213+1=3

3. Transpose convolution (Conv2DTranspose)

1 Introduction

For many generative models (such as semantic segmentation, autoencoders, generators in GANs, etc.), we often want to perform the opposite transformation from standard convolution, that is, perform upsampling. For semantic segmentation, an encoder is first used to extract feature maps, and then a decoder is used to restore the original image size, thus classifying each pixel of the original image.

Traditional methods of achieving upsampling are to apply interpolation schemes or manually create rules. Modern architectures such as neural networks tend to allow the network to automatically learn appropriate transformations without human intervention. To do this we can use transposed convolution.

2. Misunderstanding of the name of transposed convolution

This operation is sometimes called “deconvolution” after (Zeiler et al., 2010), but is really the transpose (gradient) of atrous_conv2d rather than an actual deconvolution.
Deconvolutional Networks: Zeiler et al., 2010 (pdf)

Transposed convolution is also called deconvolution ( deconvor deconvolution) and deconvolution. However, transposed convolution is currently the most formal and mainstream name, because this name more appropriately describes Conv2DTransposethe calculation process of , while other names are easily misleading. In mainstream deep learning frameworks, such as TensorFlow, Pytorch, and Keras, the function names are all conv_transpose. Therefore, before learning transposed convolution, we must clarify the standard name. When others talk about deconvolution and deconvolution, we must also help them correct it, so that incorrect naming will be submerged in the long river of history as soon as possible.

Expressed as a formula, the standard convolution operation can be expressed as: y = C xy=Cxy=C x , whereCCC convolution matrix,xxx is the input matrix. . If it is an inverse convolution based on a mathematical relationship, it should be expressed as:x = C − 1 yx=C^{-1}yx=C1 y, but the real relationship of deconvolution should be:x = CT yx=C^Tyx=CT y. It can be proved by the formula: the so-called "deconvolution" is more accurately called "transposed convolution".

Let's first talk about why people like to call transposed convolution as deconvolution or deconvolution. First, let's give an example. If a 4x4 input is passed through a 3x3 convolution kernel and then subjected to ordinary convolution (no padding, stride=1), a 2x2 output will be obtained. The transposed convolution passes a 2x2 input through a convolution kernel of the same 3x3 size to obtain a 4x4 output, which seems to be the inverse process of ordinary convolution. Just like the inverse process of addition is subtraction and the inverse process of multiplication is division, people naturally think that these two operations seem to be a reversible process. Transposed convolution is not the inverse operation of convolution (general convolution operations are irreversible), transposed convolution is also a convolution.Transpose convolution is not the complete inverse process (inverse operation) of forward convolution. It cannot completely restore the data of the input matrix, but can only restore the size (shape) of the input matrix.. Therefore, the name of transposed convolution comes from this, rather than "deconvolution" or "inverse convolution". Bad names can easily misunderstand people.

In some literature, transposed convolution is also called fractional stride convolution (convolution with fractional strides) or deconvolution (Deconvolution) or backwards strided convolution (backwards strided convolution), but this Deconvolutionis misleading and is not recommended. Therefore, bloggers strongly recommend using the names Conv2DTranspose" and convolution with fractional strides " , which correspond to the code version and the academic paper version respectively .

I think transpose_conv2d or conv2d_transpose are the cleanest names.

3. Conv2DTransposeIntroduction

3.1 Conv2DTransposeConcept

Transposed Convolution is a special kind of forward convolution. It first expands the size of the input image by padding zero elements according to a certain proportion, then rotates the convolution kernel, and then performs forward convolution. transposed convolution inSemantic segmentationorAdversarial Neural Network (GAN)It is relatively common, and its main function is to do upsampling ( UpSampling ).
Insert image description here

3.2 Conv2Dand Conv2DTransposecomparison

There is a big difference between transposed convolution and standard convolution. Direct convolution uses a "small window" to see a "big world", while transposed convolution uses a part of a "big window" to see the "small world". world".

In standard convolution (large image becomes small image), the input is (5,5), the step size is (2,2) , and the output is (3,3).

Insert image description here

In the transpose convolution operation (small image becomes larger image), input (3,3) output (5,5).
Insert image description here

3.3 Use matrix multiplication to describe transposed convolution

For the input element matrix X and the output element matrix Y, use matrix operations to describeStandard convolutionCalculation process:
Y = CXY=CXY=The operation of CX
transpose convolution is to perform the inverse operation of this matrix operation process, that is, throughCCCYYY getsXXX , according to the size of each matrix, we can easily gettransposed convolutionThe calculation process:
X = CTYX=C^{T}YX=CIf T Y
is substituted into numerical calculations, we will find that,The operation of transposed convolution just restores the matrix XXThe size of X cannot restoreXXEach element value of X. The proof of this conclusion is introduced later.

3.4 Mathematical derivation of transposed convolution

Transpose Convolution (Transpose Convolution)
peels off the cocoon and helps you understand transpose convolution (deconvolution)

4×4Define an input matrix of size input:
input = [ x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 16 ] \left.input=\ left[\begin{array}{cccc}x_1&x_2&x_3&x_4\\x_5&x_6&x_7&x_8\\x_9&x_{10}&x_{11}&x_{12}\\x_{13}&x_{14}&x_{15}&x_{16}\end{array }\right.\right]input= x1x5x9x13x2x6x10x14x3x7x11x15x4x8x12x16
A 3×3standard convolution kernel of size kernel:
kernel = [ w 0 , 0 w 0 , 1 w 0 , 2 w 1 , 0 w 1 , 1 w 1 , 2 w 2 , 0 w 2 , 1 w 2 , 2 ] kernel =\begin{bmatrix}w_{0,0}&w_{0,1}&w_{0,2}\\w_{1,0}&w_{1,1}&w_{1,2}\\w_{2, 0}&w_{2,1}&w_{2,2}\end{bmatrix}kernel= w0,0w1,0w2,0w0,1w1,1w2,1w0,2w1,2w2,2
Let strides = 1 strides=1strides=1 ,padding = 0 padding=0padding=0 ,即 i = 4 , k = 3 , s = 1 , p = 0 i=4,k=3,s=1,p=0 i=4,k=3,s=1,p=0 , then according toformula (1) formula (1)Formula ( 1 ) calculates the output matrixoutput outputoutput
o u t p u t = [ y 0 y 1 y 2 y 3 ] output=\begin{bmatrix}y_0&y_1\\y_2&y_3\end{bmatrix} output=[y0y2y1y3]
Here, we change the expression. We expand the input matrixinputand the output matrixoutputinto column vectorsXand column vectorsY. Then the dimensions of the vectorsXand vectorsandYrespectively, which can be expressed by the following formulas:16×14×1

Expand the input matrix inputinto a 16×1column vector XXX
X = [ x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 16 ] T \begin{array}{llllllllllll}X=[x_{1}&x_{2}&x_{3}&x_{4}&x_{5}&x_{6}&x_{7}&x_{8}&x_{9}&x_{10}&x_{11}&x_{12}&x_{13}&x_{14}&x_{15}&x_{16}]^T\end{array} X=[x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16]T
Put the output matrix output outputo u tp u t expands into a4×1column vectorYYY
Y = [ y 1 y 2 y 3 y 4 ] T Y=\begin{bmatrix}y_1&y_2&y_3&y_4\end{bmatrix}^T Y=[y1y2y3y4]T
then uses matrix operations to describe standard convolution operations. Here, matrix is ​​usedCto represent the standard convolution kernel matrix:
Y = CXY=CXY=CX
guidance, the dimensions of this rare Csquare4×16:
C = [ u 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 3 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 ] C=\begin{bmatrix}u_{0 ,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2,0}&w_{2,1}&w_{2,3 }&0&0&0&0&0\\0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2,0}&w_{2, 1}&w_{2,2}&0&0&0&0\\0&0&0&0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2 ,0}&w_{2,1}&w_{2,2}&0\\0&0&0&0&0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{ 1,2}&0&w_{2,0}&w_{2,1}&w_{2,2}\end{bmatrix}C= u0,0000w0,1w0,000w0,2w0,1000w0,200w1,00w0,00w1,1w1,0w0,1w0,0w1,2w1,1w0,2w0,10w1,20w0,2w2,00w1,00w2,1w2,0w1,1w1,0w2,3w2,1w1,2w1,10w2,20w1,200w2,0000w2,1w2,000w2,2w2,1000w2,2
The above matrix operation is shown in the figure below:
Insert image description here

For transposed convolution, it is actually the inverse operation of forward convolution, that is, through CCCYYY getsXXX
X = C T Y X=C^{T}Y X=CT Y
At this time, the new sparse matrix becomesCTC^T16×4withCT , the following figure visually shows the convolution operation after transposition. Here, the weight matrix used for transposed convolution does not necessarily come from the original standard convolution kernel matrix, but the shape of the weight matrix is ​​the same as the transposed convolution kernel matrix.
Insert image description here

Then 16×1by reordering the output results of , we can 2×2get an 4×4output matrix of size through the input matrix of size.

Note: If you plug in numbers to calculate, you will find thatThe operation of transposed convolution just restores the matrix XXThe size of X cannot restoreXXEach element value of X. The proof of this conclusion is introduced later.

4. (PyTorch version) Conv2DTransposeCalculation process

The size of the transposed convolution kernel is kernel(k), the step size is stride(s), and padding§ is used. Then the calculation steps of the transposed convolution can be summarized into three steps:

  1. The first step: calculate the new input feature map;
  2. Step 2: Calculate the transposed convolution kernel;
  3. Step 3: Perform standard convolution operations.

4.1 Step 1: Calculate new input feature map

For the input feature map MMM proceedinterpolationZero elements, get the new input feature map M ′ M^{\prime}M

Take the height of the feature map Height as an example, the Height of the input feature map is H in H_{in}Hin, there is (H in − 1) (H_{in}-1) in the middle(Hin1 ) gap.
The number of interpolated zero elements between two adjacent positions: s − 1 s-1s1 s s s represents the step size.
The total number of interpolated zero elements in the Height direction: ( H in − 1 ) ∗ ( s − 1 ) (H_{in}-1) * (s-1)(Hin1)(s1 ) .
New input feature map size:H in ′ = H in + ( H in − 1 ) ∗ ( s − 1 ) H_{in}^{\prime} = H_{in} + (H_{in}-1)* (s-1)Hin=Hin+(Hin1)(s1)

4.2 Step 2: Calculate the transposed convolution kernel

For the standard convolution kernel KKK is flipped up and down and left and right to obtain the transposed convolution kernelK ′ K^{\prime}K

Known:
standard convolution kernel size: kkk ,
standard convolution kernel stride:sss ,
standard convolution kernel padding:ppp

  1. Transposed convolution kernel size: k ′ = kk^{\prime}=kk=k
  2. Transposed convolution kernel stride: s ′ = 1 s^{\prime}=1s=1This value is always 1
  3. Transposed convolution kernel padding: p ′ = k − p − 1 p^{\prime} = kp-1p=kp1How this formula came about is explained below.
    Insert image description here

4.3 Step 3: Perform standard convolution operation

Use the transposed convolution kernel to perform a standard convolution operation on the new input feature map , and the result obtained is the result of the transposed convolution.

According to the calculation formula of standard convolution:
H out = ( H in ′ + 2 p ′ − k ′ ) s ′ + 1 ( 2 ) \mathrm{H_{out}}=\frac{(\mathrm{H_{in }^{\prime}}+2\mathrm{p^{\prime}}-k^{\prime})}{\mathrm{s^{\prime}}}+1\quad(2)Hout=s(Hin+2pk)+1(2)

H ′ = H i n + ( H i n − 1 ) ∗ ( s − 1 ) H^{\prime} = H_{in} + (H_{in}-1)*(s-1) H=Hin+(Hin1)(s1 )
k ′ = kk^{\prime}=kk=k,
s ′ = 1 s^{\prime}=1 s=1,
p ′ = k − p − 1 p^{\prime} = k-p-1 p=kp1

Substituting the transformation results in the first and second steps into the above equation, we can get:
H out = ( H in + H in ∗ s − H − s + 1 ) + 2 ∗ ( k − p − 1 ) − ks ′ + 1 (3) \text{H}_{out}=\frac{(\text{H}_{in}+\text{H}_{in}*s-\text{H}-\text{s} +1)+2*(\text{k}-\text{p}-1)-\text{k}}{\text{s}'}+1\quad(3)Hout=s(Hin+HinsHs+1)+2(kp1)k+1(3)
化简,可得:
H o u t = ( H i n − 1 )*s + k − 2 p − 1 s ′ + 1 ( 4 ) \text{H}_{out}=\frac{(\text{H}_{in}-1\text{)*s}+\text{k}-2\text{p}-1}{\text{s}'}+1\quad(4) Hout=s(Hin1)*s+k2p1+1( 4 )
In the above formula, the denominator step sizes ′ = 1 s^{\prime}=1s=1 , then the final result is:
H out = ( H in − 1 ) ∗ s − 2 p + k ( 5.1 ) \mathrm{H}_{out}=(\mathrm{H}_{in}-1)* \text{s}-2\text{p}+\mathrm{k}\quad(5.1)Hout=(Hin1)s2p+k(5.1)

To sum up, the results of the transposed convolution calculation in the two directions of the feature map Height and Width can be obtained:
H out = ( H in − 1 ) × stride [ 0 ] − 2 × padding [ 0 ] + kernel _ size [ 0 ] W out = ( W in − 1 ) × stride [ 1 ] − 2 × padding [ 1 ] + kernel _ size [ 1 ] H_{out}=(H_{in}−1)×stride[0] −2×padding[0]+kernel\_size[0]\\ W_{out}=(W_{in}−1)×stride[1]−2×padding[1]+kernel\_size[1]Hout=(Hin1)×stride[0]2×padding[0]+kernel_size[0]Wout=(Win1)×stride[1]2×padding[1]+kernel_size[1]

4.4 Prove that p ′ = k − p − 1 p^{\prime}=kp-1p=kp1

Transformation formula (5.1) formula (5.1)Formula ( 5.1 ) can be obtained:
H in = H out + 2 p − ks + 1 (5.2) H_{in}=\frac{H_{out}+2p-k}s+1\quad(5.2)Hin=sHout+2pk+1(5.2)

From formula (5.2) formula (5.2)Formula ( 5.2 ) andFormula (1) Formula (1)It can be seen from formula ( 1 )Conv2D that andConv2DTransposein the input and output shape sizesinverses

Note: torch.nn.ConvTranspose2d
The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sizes of the input. This is set so that when a Conv2d and a ConvTranspose2d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes.

The parameter padding effectively dilation ∗ (kernel _ size − 1 ) − padding dilation * (kernel\_size - 1) - paddingdilation(kernel_size1)p a dd in g Zero-padded padding is added to inputs of both sizes. This is set so that when Conv2d and ConvTranspose2d are initialized with the same parameters, their input and output shape sizesmutually inverse.
To simply understand, the function of parameter padding is to make Conv2dand input and ConvTranspose2doutputShapes and sizes are inverse to each other

In the second step, p ′ = k − p − 1 p^{\prime} = kp-1p=kp1How is the calculation formula generated? In fact, it is derived (reverse)based on the condition that "Conv2dandConvTranspose2d. It can be simply proved:

Known conditions :
H ′ = H in + ( H in − 1 ) ∗ ( s − 1 ) H^{\prime} = H_{in} + (H_{in}-1)*(s-1)H=Hin+(Hin1)(s1 )
k ′ = kk^{\prime}=kk=k,
s ′ = 1 s^{\prime}=1 s=1,
p ′ p^{\prime} p' Unknown to be requested.

Substitute the known conditions into formula (2) formula (2)公式(2) 中,可得:
H o u t = ( H i n + H i n ∗ s − H − s + 1 ) + 2 ∗ p ′ − k s ′ + 1 \text{H}_{out}=\frac{(\text{H}_{in}+\text{H}_{in}*s-\text{H}-\text{s}+1)+2*p^{\prime}-\text{k}}{\text{s}'}+1 Hout=s(Hin+HinsHs+1)+2pk+1
Simplify, we can get:
H out = ( H in − 1 ) ∗ s + 2 ∗ p ′ − k + 2 ( 6 ) \mathrm{H}_{out}=(\mathrm{H}_{in} -1)*\text{s}+2*p^{\prime}-\mathrm{k}+2\quad(6)Hout=(Hin1)s+2pk+2( 6 )
According to "Conv2dandConvTranspose2doutput are inverse to each other", we can get:
H in = ( H out − 1 ) ∗ s + 2 ∗ p ′ − k + 2 ( 7 ) \mathrm{H}_{in }=(\mathrm{H}_{out}-1)*\text{s}+2*p^{\prime}-\mathrm{k}+2\quad(7)Hin=(Hout1)s+2pk+2( 7 )
The transformation formula can be obtained:
H out = ( H in − 2 ∗ p ′ + k − 2 ) s + 1 ( 8 ) \mathrm{H_{out}}=\frac{(\mathrm{H_{in} }-2*\mathrm{p^{\prime}}+\mathrm{k}-2)}{\mathrm{s}}+1\quad(8)Hout=s(Hin2p+k2)+1( 8 )
is given byformula (8) formula (8)Formula ( 8 ) andFormula (1) Formula (1)Formula ( 1 ) can be obtained:
2 p − k = − 2 ∗ p ′ + k − 2 2p-k=-2*p^{\prime}+k-22pk=2p+k2Solution
:
p ′ = k − p − 1 ( 9 ) p^{\prime}=kp-1\quad(9)p=kp1(9)

Certificate closed.

4.5 Conv2DTransposeExample

Insert image description here

Input feature map MMM H i n = 3 H_{in}=3 Hin=3 .
Standard convolution kernelKKK k = 3 , s = 2 , p = 1 k=3,s=2, p=1 k=3,s=2,p=1 .
New input feature mapM ′ M^{\prime}M H i n ′ = 3 + ( 3 − 1 ) ∗ ( 2 − 1 ) = 3 + 2 = 5 H_{in}^{\prime}=3+(3−1)∗(2−1)=3+2=5 Hin=3+(31)(21)=3+2=5 . Note that after adding padding, it is 7.
Transposed convolution kernelK ′ K^{\prime}K k ′ = k , s ′ = 1 , p ′ = 3 − 1 − 1 = 1 k^{\prime}=k,s^{\prime}=1,p^{\prime}=3−1−1=1 k=k,s=1,p=311=1 .
The final result of the transposed convolution calculation:H out = ( 3 − 1 ) ∗ 2 − ​​2 ∗ 1 + 3 = 5 \mathrm{H_{out}}=(3-1)*2-2*1+3= 5Hout=(31)221+3=5

Insert image description here

5. (TensorFlow version) Conv2DTransposecalculation process

To calculate the TensorFlow version Conv2DTranspose, you first need to construct output_shape. The calculation formula of the output size is:
o = s ( i − 1 ) + a + k − 2 p , a ∈ { 0 , … , s − 1 } o=s(i-1) +a+k-2p,a\in\{0,\ldots,s-1\}o=s(i1)+a+k2p,a{ 0,,s1 }
This formula is actually the inverse operation of the convolution output size.The only reason why it is not the same is that the operation of rounding down makes the output size determined by the input of the convolution operation.

5.1 Step 1: Calculate new input feature map

Consistent with PyTorch.

5.2 Step 2: Calculate the transposed convolution kernel

Consistent with PyTorch.

5.3 Step 3: Perform standard convolution operation

The reason why the third step is different from PyTorch is that TensorFlow's padding algorithm is different from PyTorch, resulting in different outputs when performing standard convolution operations.
For an introduction to TensorFlow's padding algorithm, please refer to the blog: Understanding TensorFlow's padding algorithm in simple terms .

Take the height of the feature map Height as an example, TensorFlow’s transposed convolution calculation formula is divided into two situations:

  1. ( H o u t + 2 p − k ) % s = 0 (\mathrm H_{out}+2p-k)\%s=0 (Hout+2pk)%s=When 0 , the transposed convolution formula is:
    H out = ( H in − 1 ) ∗ s − 2 p + k ( 10 ) \mathrm H_{out}=(\mathrm H_{in}-1)*\mathrm s -2p+\mathrm k \quad(10)Hout=(Hin1)s2p+k(10)

    Insert image description here

    As shown in the figure above, we select an input input inputin p u t size is 3×3, convolution kernelkernelk er n e l size is 3×3, strides= 2 strides=2strides=2 ,padding = 1 padding=1padding=1,即 i = 3 , k = 3 , s = 2 , p = 1 i=3,k=3,s=2,p=1 i=3,k=3,s=2,p=1 , thenoutput outputThe size of o u tp u t iso = ( 3 − 1 ) x 2 − 2 + 3 = 5 o=(3−1)x2−2+3=5o=(31 ) x 22+3=5

  2. ( H o u t + 2 p − k ) % s ≠ 0 (\mathrm H_{out}+2p-k)\%s\neq0 (Hout+2pk)%s=When 0 , the calculation formula of transposed convolution is:

    H o u t = ( H i n − 1 ) ∗ s − 2 p + k + ( H _ o u t + 2 p − k ) % s ( 11 ) \mathrm H_{out}=(\mathrm H_{in}-1)*\mathrm s-2p+\mathrm k+(H\_out+2p-k)\%s \quad(11) Hout=(Hin1)s2p+k+(H_out+2pk)%s(11)

    Insert image description here

    As shown in the figure above, we select an input input inputThe size of in p u t is 3×3, convolution kernelkernelThe size of k er n e l is 3×3, and the stride lengthis strides = 2 strides=2strides=2 ,padding = 1 padding=1padding=1 ,即 i = 3 , k = 3 , s = 2 , p = 1 i=3,k=3,s=2,p=1 i=3,k=3,s=2,p=1 , thenoutput outputThe dimensions of o u tp u t are o = ( 3 − 1 ) x 2 − 2 + 3 + 1 = 6 o=(3−1)x2−2+3+1=6o=(31 ) x 22+3+1=6

In the above formula, 2 p = p _ top + p _ bottom 2p=p\_top+p\_bottom2p=p_top+p_b o tt o m ,p_top and p_bottom p\_top and p \ _bottomp _ t o p and p _ b o tt o m respectively represent the padding at the top and bottom in the Height direction.
Usually, it is known that H out H_{out}Hout, or stride = 1 stride=1stride=1 , the relevant parameters (p, H_outp, H\_outp,H_out)。

6. Transpose convolution can only restore the size, not the value.

Standard convolution operation:

import tensorflow as tf


value = tf.reshape(tf.constant([[1., 2., 3., 4., 5.],
                                [6., 7., 8., 9., 10.],
                                [11., 12., 13., 14., 15.],
                                [16., 17., 18., 19., 20.],
                                [21., 22., 23., 24., 25.]]), [1, 5, 5, 1])
filter = tf.reshape(tf.constant([[1., 0.],
                                 [0., 1.]]), [2, 2, 1, 1])
output = tf.nn.conv2d(value, filter, [1, 2, 2, 1], 'SAME')
print(output)

"""
tf.Tensor(
[[[[ 8.]
   [12.]
   [ 5.]]

  [[28.]
   [32.]
   [15.]]

  [[21.]
   [23.]
   [25.]]]], shape=(1, 3, 3, 1), dtype=float32)
"""

The result of standard convolution is:
output = [ 8 12 5 28 32 15 21 23 25 ] output=\begin{bmatrix}8&12&5\\28&32&15\\21&23&25\end{bmatrix}output= 8282112322351525
We perform a transposed convolution operation on this result using exactly the same parameters as a standard convolution operation:

input = tf.reshape(tf.constant([[8., 12., 5.],
                                [28., 32., 15.],
                                [21., 23., 25.]]), [1, 3, 3, 1])

kernel = tf.reshape(tf.constant([[1., 0.],
                               [0., 1.]]), [2, 2, 1, 1])

output = tf.nn.conv2d_transpose(input=input,
                                filters=kernel,
                                output_shape=[1, 5, 5, 1],
                                strides=[1, 2, 2, 1],
                                padding='SAME')
print(output)
"""
tf.Tensor(
[[[[ 8.]
   [ 0.]
   [12.]
   [ 0.]
   [ 5.]]

  [[ 0.]
   [ 8.]
   [ 0.]
   [12.]
   [ 0.]]

  [[28.]
   [ 0.]
   [32.]
   [ 0.]
   [15.]]

  [[ 0.]
   [28.]
   [ 0.]
   [32.]
   [ 0.]]

  [[21.]
   [ 0.]
   [23.]
   [ 0.]
   [25.]]]], shape=(1, 5, 5, 1), dtype=float32)
"""

The result of transposed convolution is:
output = [ 8 0 12 0 5 0 8 0 12 0 28 0 32 0 15 0 28 0 32 0 21 0 23 0 25 ] output=\begin{bmatrix}8&0&12&0&5\\0&8&0&12&0\\ 28&0&32&0&15\\0&28&0&32&0\\21&0&23&0&25\end{bmatrix}output= 80280210802801203202301203205015025
It can be seen from this thatTransposed convolution can only restore dimensions, not values.

7. Conv2DTransposeThe step size in stride

The following figure shows the situation of different s and p in transposed convolution:

Insert image description here Insert image description here Insert image description here
s=1, p=0, k=3 s=2, p=0, k=3 s=2, p=1, k=3

7.1 When step size stride=1,p=0,k=3

Insert image description here

Input feature map (blue): ( H in , W in ) = ( 2 , 2 ) (H_{in},W_{in})=(2,2)(Hin,Win)=(2,2 )
Determine the parameters:kernel _ size ( k ) = 3 , step ( s ) = 1 , padding ( p ) = 0 kernel\_size( k ) = 3 , stride ( s ) = 1 , padding ( p ) . =0kernel_size(k)=3,stride(s)=1,padding(p)=0 .
New input feature map:H in ′ = 2 + ( 2 − 1 ) ∗ ( 1 − 1 ) = 2 H_{in}^{\prime} =2+(2-1)*(1-1)=2Hin=2+(21)(11)=2 . As shown in the figure above, the new input feature map obtained after interpolation transformation is (2,2).
Transposed convolution kernel:kernel _ size ( k ′ ) = 3 , stride ( s ′ ) = 1 , padding ( p ′ ) = 3 − 0 − 1 = 2 kernel\_size(k^{\prime})=3 ,stride(s^{\prime})=1, padding(p^{\prime})=3-0-1=2kernel_size(k)=3,stride(s)=1,padding(p)=301=2 . As shown in the figure above, the padding is 2.
Output feature map (green):( H out , W out ) = ( 4 , 4 ) (H_{out},W_{out})=(4,4)(Hout,Wout)=(4,4)

Substitution formula ( 5 ) Formula (5)公式(5)中,可得:
H o u t = ( H i n − 1 ) ∗ s − 2 p + k H o u t = ( 2 − 1 ) ∗ 1 − 2 ∗ 0 + 3 = 4 \mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(2-1)*1-2*0+3=4 Hout=(Hin1)s2p+kHout=(21)120+3=4

7.2 When step size stride=2,p=0,k=3

Insert image description here

Input feature map (blue): ( H in , W in ) = ( 2 , 2 ) (H_{in},W_{in})=(2,2)(Hin,Win)=(2,2)
卷积核: k = 3 , s t r i d e ( s ) = 2 , p a d d i n g = 0 k=3,stride(s)=2, padding=0 k=3,stride(s)=2,padding=0 .
New input feature map:H in ′ = 2 + ( 2 − 1 ) ∗ ( 2 − 1 ) = 3 H_{in}^{\prime} =2+(2-1)*(2-1)=3Hin=2+(21)(21)=3 . As shown in the figure above, the new input feature map obtained after interpolation transformation is (3,3).
Transposed convolution kernel:kernel _ size ( k ′ ) = 3 , stride ( s ′ ) = 1 , padding ( p ′ ) = 3 − 0 − 1 = 2 kernel\_size(k^{\prime})=3 ,stride(s^{\prime})=1, padding(p^{\prime})=3-0-1=2kernel_size(k)=3,stride(s)=1,padding(p)=301=2 . As shown in the figure above, the padding is 2.
Output feature map (green):( H out , W out ) = ( 5 , 5 ) (H_{out},W_{out})=(5,5)(Hout,Wout)=(5,5)

Substitution formula ( 5 ) Formula (5)公式(5)中,可得:
H o u t = ( H i n − 1 ) ∗ s − 2 ∗ p + k H o u t = ( 2 − 1 ) ∗ 2 − 2 ∗ 0 + 3 = 5 \mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2*\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(2-1)*2-2*0+3=5 Hout=(Hin1)s2p+kHout=(21)220+3=5

7.3 When step size stride=2,p=1,k=3

Insert image description here

Input feature map (blue): ( H in , W in ) = ( 3 , 3 ) (H_{in},W_{in})=(3,3)(Hin,Win)=(3,3)
卷积核: k = 3 , s t r i d e ( s ) = 2 , p a d d i n g = 1 k=3,stride(s)=2, padding=1 k=3,stride(s)=2,padding=1 .
New input feature map:H in ′ = 3 + ( 3 − 1 ) ∗ ( 2 − 1 ) = 5 H_{in}^{\prime} =3+(3-1)*(2-1)=5Hin=3+(31)(21)=5 . As shown in the figure above, the new input feature map obtained after interpolation transformation is (5,5).
Transposed convolution kernel:kernel _ size ( k ′ ) = 3 , stride ( s ′ ) = 1 , padding ( p ′ ) = 3 − 1 − 1 = 1 kernel\_size(k^{\prime})=3 ,stride(s^{\prime})=1, padding(p^{\prime})=3-1-1=1kernel_size(k)=3,stride(s)=1,padding(p)=311=1 . As shown in the figure above, the padding is 1.
Output feature map (green):( H out , W out ) = ( 5 , 5 ) (H_{out},W_{out})=(5,5)(Hout,Wout)=(5,5)

Substitution formula ( 5 ) Formula (5)公式(5)中,可得:
H o u t = ( H i n − 1 ) ∗ s − 2 ∗ p + k H o u t = ( 3 − 1 ) ∗ 2 − 2 ∗ 1 + 3 = 5 \mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2*\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(3-1)*2-2*1+3=5 Hout=(Hin1)s2p+kHout=(31)221+3=5

8. Checkerboard Artifacts

Checkerboard Artifacts (Checkerboard Artifacts)
Convolution Operation Summary (3) - Causes and Solutions of Transposed Convolution Checkerboard Artifacts
Deconvolution and Checkerboard Artifacts

The checkerboard effect is the result of the "uneven overlap" of transposed convolution, which makes certain parts of the image darker than other parts.
Insert image description here

9. Summary

  1. Conv2D, feature map transformation:

H o u t = H i n + 2 p − k s + 1 H_{out}=\frac{H_{in}+2p-k}s+1 Hout=sHin+2pk+1

  1. Conv2DTranspose, feature map transformation:

H o u t = ( H i n − 1 ) ∗ s − 2 p + k \mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2\text{p}+\mathrm{k} Hout=(Hin1)s2p+k
3.Standard convolution kernel:s, p, ks,p,ks,p,k ;Transposed convolution kernel:s = 1, p ′ = k − p − 1 , k ′ = ks=1,p^{\prime}=kp-1,k^{\prime}=ks=1,p=kp1,k=k .
4. The stride in the first step determines the interpolation filling (the number of zero elements). The expansion multiple is related to strides. The expansion method is to insertstrides -10 between elements. The third stepstride=1will never change.
5.Conv2DAndConv2DTransposeinput and outputShapes and sizes are inverse to each other.
6. Standard convolution (large image becomes small image, (5,5) to (3,3)), transposed convolution (small image becomes large image, (3,3) to (5,5)).

4. Relevant experience

1. (loud)tf.keras.layers.Conv2DTranspose

TF official document: tf.keras.layers.Conv2DTranspose
TensorFlow function: tf.layers.Conv2DTranspose
The implementation mechanism and special case handling method of transposed convolution conv2d_transpose in tensorflow

function prototype

Take TensorFlow v2.14.0the version as an example to introduce transposed convolution.

tf.keras.layers.Conv2DTranspose(
    filters,
    kernel_size,
    strides=(1, 1),
    padding='valid',
    output_padding=None,
    data_format=None,
    dilation_rate=(1, 1),
    activation=None,
    use_bias=True,
    kernel_initializer='glorot_uniform',
    bias_initializer='zeros',
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    **kwargs
)

Parameter explanation

  • filters: Integer, the dimension of the output (that is, the number of output channels).
  • kernel_size: A tuple or a list of 2 positive integers, specifying the spatial dimensions of filters.
  • strides: A tuple or list of 2 positive integers specifying the stride of the convolution.
  • padding: A string, "valid" or "same", filling algorithm.
  • output_padding: A tuple or a list of 2 positive integers, specifying the padding in the height and width directions of the output tensor. If set to "None" (default), the output shape is automatically inferred.
  • data_format: A string, which can be one channels_last(default) or channels_first, indicating the order of input dimensions. channels_lastcorresponds to an input with shape (batch, height, width, channels), while channels_firstcorresponds to an input with shape (batch, channels, height, width).
  • dilation_rate: Integer, specifying the expansion rate of all spatial dimensions of the dilation convolution.
  • activation: Specify the activation function. If set to "None" (default), the activation function will not be used.
  • use_bias: Boolean, indicating whether this layer uses bias.
  • kernel_initializer: Initializer of the convolution kernel weight matrix (see keras.initializers), the default is "glorot_uniform".
  • bias_initializer: Initializer for bias vector, default is "zeros".
  • kernel_regularizer: Regularizer applied to the convolution kernel weight matrix (see keras.regularizers).
  • bias_regularizer: Regularizer applied to the bias vector.
  • activity_regularizer: Regularizer applied to the output of the activation layer.
  • kernel_constraint: The constraint function applied to the convolution kernel.
  • bias_constraint: Constraint function applied to the bias vector.

Input shape

4D tensor with shape: (batch_size, channels, rows, cols) if data_format=channels_first or 4D tensor with shape: (batch_size, rows, cols, channels) if data_format=channels_last.

Output shape

4D tensor with shape: (batch_size, filters, new_rows, new_cols) if data_format=channels_first or 4D tensor with shape: (batch_size, new_rows, new_cols, filters) if data_format=channels_last.

new_rows = ((rows - 1) * strides[0] + kernel_size[0] - 2 * padding[0] +
output_padding[0])
new_cols = ((cols - 1) * strides[1] + kernel_size[1] - 2 * padding[1] +
output_padding[1])

2. (TensorFlow)tf.nn.conv2d_transpose

TF official document: tf.nn.conv2d_transpose

tf.nn.conv2d_transpose is sometimes called “deconvolution” after (Zeiler et al., 2010), but is really the transpose (gradient) of conv2d rather than an actual deconvolution.

Core source code

View tf.nn.conv2d_transposethe source code located at nn_ops.py#L2689-L2773 . After some searching, we found that the function points to nn_ops.py#L2607 :

# 卷积操作的逆运算(反向推导),已知output,计算input
# https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/ops/nn_ops.py#L2547-L2609

@tf_export(v1=["nn.conv2d_backprop_input"])
@dispatch.add_dispatch_support
def conv2d_backprop_input(  # pylint: disable=redefined-builtin,dangerous-default-value
    input_sizes,
    filter=None,
    out_backprop=None,
    strides=None,
    padding=None,
    use_cudnn_on_gpu=True,
    data_format="NHWC",
    dilations=[1, 1, 1, 1],
    name=None,
    filters=None):
  
  filter = deprecation.deprecated_argument_lookup(  # 重命名filter,没有改变值
      "filters", filters, "filter", filter)
  padding, explicit_paddings = convert_padding(padding)  ## 改变了padding
  return gen_nn_ops.conv2d_backprop_input(
      input_sizes, filter, out_backprop, strides, padding, use_cudnn_on_gpu,
      explicit_paddings, data_format, dilations, name)
# conv2d与conv2d_transpose输入输出的形状大小互为逆
# https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/ops/nn_ops.py#L2689-L2773

@tf_export("nn.conv2d_transpose", v1=[])
@dispatch.add_dispatch_support
def conv2d_transpose_v2(
    input,  # pylint: disable=redefined-builtin
    filters,  # pylint: disable=redefined-builtin
    output_shape,
    strides,
    padding="SAME",
    data_format="NHWC",
    dilations=None,
    name=None):
    
  with ops.name_scope(name, "conv2d_transpose",
                      [input, filter, output_shape]) as name:
    if data_format is None:
      data_format = "NHWC"
    channel_index = 1 if data_format.startswith("NC") else 3

    strides = _get_sequence(strides, 2, channel_index, "strides")
    dilations = _get_sequence(dilations, 2, channel_index, "dilations")
    padding, explicit_paddings = convert_padding(padding)  

    return gen_nn_ops.conv2d_backprop_input(  # 卷积的反向推导
        input_sizes=output_shape,
        filter=filters,
        out_backprop=input,
        strides=strides,
        padding=padding,
        explicit_paddings=explicit_paddings,
        data_format=data_format,
        dilations=dilations,
        name=name)

It can be seen from the above source code that the transposed convolution operation only changes the pading, but the filter does not change, and ultimately points to the reverse derivation part of the standard convolution.

2.1 Function prototype

tf.nn.conv2d_transpose(
    input,
    filters,
    output_shape,
    strides,
    padding='SAME',
    data_format='NHWC',
    dilations=None,
    name=None
)

Parameter explanation

  • input:A 4-D Tensor of type float and shape [batch, height, width,in_channels] for NHWC data format or [batch, in_channels, height,width] for NCHW data format.
  • filters:A 4-D Tensor with the same type as input and shape [height,width, output_channels, in_channels]. filter’s in_channels dimension must match that of input.
  • output_shape:A 1-D Tensor representing the output shape of the deconvolution op.
  • strides: an integer or a list of 1, 2, or 4 positive integers, specifying the stride of the convolution.
  • padding:Either the string "SAME" or "VALID" indicating the type of padding algorithm to use.
  • data_format:A string. ‘NHWC’ and ‘NCHW’ are supported.
  • dilations: an integer or a list of 1, 2, or 4 positive integers (default is 1), specifying the expansion rate of all spatial dimensions of the dilated convolution.
  • name:Optional name for the returned tensor.

2.2 Code examples

import tensorflow as tf
import numpy as np

def test_conv2d_transpose():
    # input batch shape = (1, 2, 2, 1) -> (batch_size, height, width, channels) - 2x2x1 image in batch of 1
    x = tf.constant(np.array([[
        [[1], [2]], 
        [[3], [4]]
    ]]), tf.float32)

    # filters shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
    f = tf.constant(np.array([
        [[[1]], [[1]], [[1]]], 
        [[[1]], [[1]], [[1]]], 
        [[[1]], [[1]], [[1]]]
    ]), tf.float32)

    conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 4, 4, 1), strides=[1, 2, 2, 1], padding='SAME')

    with tf.Session() as session:
        result = session.run(conv)

    assert (np.array([[
        [[1.0], [1.0],  [3.0], [2.0]],
        [[1.0], [1.0],  [3.0], [2.0]],
        [[4.0], [4.0], [10.0], [6.0]],
        [[3.0], [3.0],  [7.0], [4.0]]]]) == result).all()

2.3 Code analysis

known conditions:

# 2*2*1 ——> 4*4*1
(in_height, in_width)=(2, 2)
(filter_height, filter_width)=(3, 3)
(strides[1], strides[2])=(2, 2)

According to TensorFlow’s padding algorithm, it can be known:

in_height % strides[1] = 2%2 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(3-2, 0)=1

pad_top = pad_along_height // 2 = 1 // 2 = 0
pad_bottom = pad_along_height - pad_top = 1-0 = 1

Then find the padding in the Height direction:

(pad_top, pad_bottom)=(0, 1)

In the same way, the padding in the Width direction is obtained as:

(pad_left, pad_right)=(0, 1)

After the transposed convolution operation, the output size is doubled, that is, H out = 4 {H}_{out}=4Hout=4,则:
( H o u t + 2 p − k ) % s = ( 4 + ( 0 + 1 ) − 3 ) % 2 = 0 (\mathrm{H}_{out}+2p-k)\%s=(4+(0+1)-3)\%2=0 (Hout+2pk)%s=(4+(0+1)3)%2=0
Then, substitute the known conditions intoformula (10) formula (10)Formula ( 10 ) , we can get:
H_out = (2 − 1) ∗ 2 − ​​(0 + 1) + 3 = 4 H\_out = (2-1)*2-(0+1)+3=4H_out=(21)2(0+1)+3=4Similarly
, we get:H _ out = ( 2 − 1 ) ∗ 2 − ​​( 0 + 1 ) + 3 = 4 H\_out = (2-1)*2-(0+1)+3=4H_out=(21)2(0+1)+3=4

To sum up, the output size is (4, 4, 1) (4,4,1)(4,4,1 ) , consistent with the results of code verification.

3. (PyTorch)torch.nn.ConvTranspose2d

torch.nn.ConvTranspose2d

3.1 Function prototype

CLASS torch.nn.ConvTranspose2d(in_channels, 
                               out_channels, 
                               kernel_size, 
                               stride=1, 
                               padding=0, 
                               output_padding=0, 
                               groups=1, 
                               bias=True, 
                               dilation=1, 
                               padding_mode='zeros', 
                               device=None, 
                               dtype=None)

Parameter explanation

  • in_channels (int) – Number of channels in the input image
  • out_channels (int) – Number of channels produced by the convolution
  • kernel_size (int or tuple) – Size of the convolving kernel
  • stride (int or tuple, optional) – Stride of the convolution. Default: 1
  • padding (int or tuple, optional) – dilation * (kernel_size - 1) - padding zero-padding will be added to both sides of each dimension in the input. Default: 0
  • output_padding ( int or tuple , optional ) – Additional size added to one side of each dimension in the output shape. Default: 0. Note that output_paddingis only used to find output shape, but does not actually add zero-padding to output. In calculations The obtained output feature map is filled with several rows or columns of 0 in the height and width directions (note that it is only one sidefilled on the top, bottom and left and right sides, not both sides)
  • groups ( int , optional ) – Number of blocked connections from input channels to output channels. Default: 1. Parameters used only when group convolution is used. The default is 1, which is ordinary convolution.
  • bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
  • dilation ( int or tuple , optional ) – Spacing between kernel elements. Default: 1. This parameter is used only when dilated convolution (dilated convolution) is used. The default is 1, which is ordinary convolution.

Variables

  • weight (Tensor) – the learnable weights of the module of shape ( in_channels , out_channels groups , kernel_size[0] , kernel_size[1] ) (\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}}, \text{kernel\_size[0]}, \text{kernel\_size[1]}) (in_channels,groupsout_channels,kernel_size[0],kernel_size[1]).
  • bias (Tensor) – the learnable bias of the module of shape (out_channels) If bias is True.

3.2 Code examples

The following uses the Pytorch framework to simulate s=1, p=0, k=3the transposed convolution operation:

Insert image description here
In the code, transposed_conv_officialthe function is calculated using the official transposed convolution. transposed_conv_selfThe function is the result of filling the input feature map by itself according to the steps mentioned above and passing the standard convolution.

import torch
import torch.nn as nn


def transposed_conv_official():
    # 输入特征图
    feature_map = torch.as_tensor([[1, 0],
                                   [2, 1]], dtype=torch.float32).reshape([1, 1, 2, 2])
    print(feature_map)
    
    # 实例化转置卷积对象
    trans_conv = nn.ConvTranspose2d(in_channels=1, out_channels=1,
                                    kernel_size=3, stride=1, bias=False)
    
    # 定义标准卷积核(注意是标准卷积核,而不是转置卷积核)
    trans_conv.load_state_dict({
    
    "weight": torch.as_tensor([[1, 0, 1],
                                                           [0, 1, 1],
                                                           [1, 0, 0]], dtype=torch.float32).reshape([1, 1, 3, 3])})
    
    print(trans_conv.weight)
    
	# 执行转置卷积操作
    output = trans_conv(feature_map)
    print(output)


def transposed_conv_self():
    # 新的输入特征图
    feature_map = torch.as_tensor([[0, 0, 0, 0, 0, 0],
                                   [0, 0, 0, 0, 0, 0],
                                   [0, 0, 1, 0, 0, 0],
                                   [0, 0, 2, 1, 0, 0],
                                   [0, 0, 0, 0, 0, 0],
                                   [0, 0, 0, 0, 0, 0]], dtype=torch.float32).reshape([1, 1, 6, 6])
    print(feature_map)
    
    # 实例化标准卷积
    conv = nn.Conv2d(in_channels=1, out_channels=1,
                     kernel_size=3, stride=1, bias=False)
    
    # 由标准卷积核进行上下、左右翻转,得到转置卷积核
    conv.load_state_dict({
    
    "weight": torch.as_tensor([[0, 0, 1],
                                                     [1, 1, 0],
                                                     [1, 0, 1]], dtype=torch.float32).reshape([1, 1, 3, 3])})
    print(conv.weight)
    
    # 执行标准卷积操作
    output = conv(feature_map)
    print(output)


def main():
    transposed_conv_official()
    print("---------------")
    transposed_conv_self()


if __name__ == '__main__':
    main()

Output results

tensor([[[[1., 0.],
          [2., 1.]]]])
Parameter containing:
tensor([[[[1., 0., 1.],
          [0., 1., 1.],
          [1., 0., 0.]]]], requires_grad=True)
tensor([[[[1., 0., 1., 0.],
          [2., 2., 3., 1.],
          [1., 2., 3., 1.],
          [2., 1., 0., 0.]]]], grad_fn=<SlowConvTranspose2DBackward>)
---------------
tensor([[[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 1., 0., 0., 0.],
          [0., 0., 2., 1., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]]])
Parameter containing:
tensor([[[[0., 0., 1.],
          [1., 1., 0.],
          [1., 0., 1.]]]], requires_grad=True)
tensor([[[[1., 0., 1., 0.],
          [2., 2., 3., 1.],
          [1., 2., 3., 1.],
          [2., 1., 0., 0.]]]], grad_fn=<ThnnConv2DBackward>)

Process finished with exit code 0

3.3 Code analysis

Input feature map MMM H i n = 2 H_{in}=2 Hin=2

Standard convolution kernel KKK k = 3 , s = 1 , p = 0 k=3,s=1, p=0 k=3,s=1,p=0

New input feature map M ′ M^{\prime}M H i n ′ = 2 + ( 2 − 1 ) ∗ ( 1 − 1 ) = 2 H_{in}^{\prime}=2+(2−1)∗(1−1)=2 Hin=2+(21)(11)=2

Transposed convolution kernel K ′ K^{\prime}K k ′ = k , s ′ = 1 , p ′ = 3 − 0 − 1 = 2 k^{\prime}=k,s^{\prime}=1,p^{\prime}=3−0−1=2 k=k,s=1,p=301=2

The final result of the transposed convolution calculation: H out = ( 2 − 1 ) ∗ 1 − 2 ∗ 0 + 3 = 4 \mathrm{H_{out}}=(2-1)*1-2*0+3= 4Hout=(21)120+3=4

4. DCGAN

论文:Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Deep Convolutional Generative Adversarial Network

4.1 Code examples

The generator G in the DCGANS network uses tf.keras.layers.Conv2DTranspose(upsampling) layers to generate images from seeds (random noise). Start with a layer that uses this seed as input Dense, and upsample multiple times until you reach the desired image size of 28x28x1.

def make_generator_model():
    model = tf.keras.Sequential()  #创建模型实例
    # 第一层须指定维度 #BATCH_SIZE无限制
    model.add(layers.Dense(7*7*BATCH_SIZE, use_bias=False, input_shape=(100,)))  #Desne第一层可以理解为全连接层输入,它的秩必须小于2
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Reshape((7, 7, 256)))
    assert model.output_shape == (None, 7, 7, 256)  # Note: None is the batch size

    # 转化为7*7*128
    model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
    assert model.output_shape == (None, 7, 7, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
	
    # 转化为14*14*64
    model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    assert model.output_shape == (None, 14, 14, 64)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
	
    #转化为28*28*1
    model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
    assert model.output_shape == (None, 28, 28, 1)

    return model

4.2 Code analysis

step1:7*7*256 ——> 7*7*128

Analyze layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False):

known conditions:

# 7*7*256 ——> 7*7*128
(in_height, in_width)=(7, 7)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(1, 1)

According to TensorFlow’s padding algorithm, it can be known:

in_height % strides[1] = 7%1 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(5-1, 0)=4

pad_top = pad_along_height // 2 = 4 // 2 = 2
pad_bottom = pad_along_height - pad_top = 4-2 = 2

Then find the padding in the Height direction:

(pad_top, pad_bottom)=(2, 2)

In the same way, the padding in the Width direction is obtained as:

(pad_left, pad_right)=(2, 2)

Since s = 1 s=1s=1,则 ( H o u t + 2 p − k ) % s = 0 (\mathrm{H}_{out}+2p-k)\%s=0 (Hout+2pk)%s=0 equation is established, substitute the known conditions intoformula (10) formula (10)Formula ( 10 ) , we can get:
H _ out = ( 7 − 1 ) ∗ 1 − ( 2 + 2 ) + 5 = 7 H\_out = (7-1)*1-(2+2)+5=7H_out=(71)1(2+2)+5=7
In the same way, we can obtain:W _ out = ( 7 − 1 ) ∗ 1 − ( 2 + 2 ) + 5 = 7 W\_out = (7-1)*1-(2+2)+5=7W_out=(71)1(2+2)+5=7

To sum up, the output size is (7, 7, 128) (7,7,128)(7,7,128 ) , consistent with the results of code verification.

step2:7*7*128 ——> 14*14*64

Analyze layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False):

known conditions:

# 7*7*128 ——> 14*14*64
(in_height, in_width)=(7, 7)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(2, 2)

According to TensorFlow’s padding algorithm, it can be known:

in_height % strides[1] = 7%2 = 1
pad_along_height = max(filter_height - (in_height % stride_height), 0)=max(5-7%2, 0)=4

pad_top = pad_along_height // 2 = 4 // 2 = 2
pad_bottom = pad_along_height - pad_top = 4-2 = 2

Then find the padding in the Height direction:

(pad_top, pad_bottom)=(2, 2)

In the same way, the padding in the Width direction is obtained as:

(pad_left, pad_right)=(2, 2)

After the transposed convolution operation, the output size is doubled, that is, H out = 14 {H}_{out}=14Hout=14,则:
( H o u t + 2 p − k ) % s = ( 14 + ( 2 + 2 ) − 5 ) % 2 = 1 (\mathrm{H}_{out}+2p-k)\%s=(14+(2+2)-5)\%2=1 (Hout+2pk)%s=(14+(2+2)5)%2=1Then
, substitute the known conditions intoformula (11) formula (11)Formula ( 11 ) , we can get:
H _ out = ( 7 − 1 ) ∗ 2 − ​​( 2 + 2 ) + 5 + ( 14 + ( 2 + 2 ) − 5 ) % 2 = 14 H\_out = (7- 1)*2-(2+2)+5+(14+(2+2)-5)\%2=14H_out=(71)2(2+2)+5+(14+(2+2)5)%2=14
In the same way, we can obtain:W _ out = ( 7 − 1 ) ∗ 2 − ​​( 2 + 2 ) + 5 + ( 14 + ( 2 + 2 ) − 5 ) % 2 = 14 W\_out = (7-1 )*2-(2+2)+5+(14+(2+2)-5)\%2=14W_out=(71)2(2+2)+5+(14+(2+2)5)%2=14

To sum up, the output size is (14, 14, 128) (14,14,128)(14,14,128 ) , consistent with the results of code verification.

step3:14*14*64 ——> 28*28*1

Analyze layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'):

known conditions:

# 14*14*64 ——> 28*28*1
(in_height, in_width)=(14, 14)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(2, 2)

According to TensorFlow’s padding algorithm, it can be known:

in_height % strides[1] = 14%2 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(5-2, 0)=3

pad_top = pad_along_height // 2 = 3 // 2 = 1
pad_bottom = pad_along_height - pad_top = 3-1 = 2

Then find the padding in the Height direction:

(pad_top, pad_bottom)=(1, 2)

In the same way, the padding in the Width direction is obtained as:

(pad_left, pad_right)=(1, 2)

After the transposed convolution operation, the output size is doubled, that is, H out = 28 {H}_{out}=28Hout=28,则:
( H o u t + 2 p − k ) % s = ( 28 + ( 1 + 2 ) − 5 ) % 2 = 0 (\mathrm{H}_{out}+2p-k)\%s=(28+(1+2)-5)\%2=0 (Hout+2pk)%s=(28+(1+2)5)%2=0
Then, substitute the known conditions intoformula (10) formula (10)Formula ( 10 ) , we can get:
H_out = (14 − 1) ∗ 2 − ​​(1 + 2) + 5 = 28 H\_out = (14-1)*2-(1+2)+5=28H_out=(141)2(1+2)+5=28
In the same way, we get:H _ out = ( 14 − 1 ) ∗ 2 − ​​( 1 + 2 ) + 5 = 28 H\_out = (14-1)*2-(1+2)+5=28H_out=(141)2(1+2)+5=28

To sum up, the output size is (28, 28, 1) (28,28,1)(28,28,1 ) , consistent with the results of code verification.

Guess you like

Origin blog.csdn.net/m0_37605642/article/details/135280661