If you review the past and learn the new, you can become a teacher!
1. Reference materials
Paper: A guide to convolution arithmetic for deep learning
github source code: Convolution arithmetic
bilibili video: transposed convolution (transposed convolution)
transposed convolution (Transposed Convolution)
[keras/Tensorflow/pytorch] Detailed explanation of Conv2D and Conv2DTranspose in
an easy-to-understand manner Explain deconvolution?
Deconvolution operation Conv2DTranspose
2. Standard convolution (Conv2D)
1. Conv2D
Calculation formula
标准卷积计算公式有:
o = ⌊ i + 2 p − k s ⌋ + 1 i = size of input o = size of output p = p a d d i n g k = size of kernel s = s t r i d e s o=\lfloor\frac{i+2p-k}{s}\rfloor+1 \quad \begin{array}{l} \\i=\textit{size of input}\\o=\textit{size of output}\\p=padding\\k=\textit{size of kernel}\\s=strides\end{array} o=⌊si+2p−k⌋+1i=size of inputo=size of outputp=paddingk=size of kernels=strides
Among them, ⌊ ⋅ ⌋ \lfloor\cdot\rfloor⌊ ⋅ ⌋ represents the rounding down symbol.
Take the height of the feature map Height as an example, after the convolution operation, the output feature map calculation formula is:
H out = H in + 2 p − ks + 1 ( 1 ) H_{out}=\frac{H_{in}+2p-k}s+1\ quad(1)Hout=sHin+2p−k+1(1)
2. Conv2D
The step size in stride
2.1 When step size stride=1,p=0,k=3
Input feature map (blue): ( H in , W in ) = ( 4 , 4 ) (H_{in},W_{in})=(4,4)(Hin,Win)=(4,4 ) Definitely
:kernel _ size ( k ) = 3 , stride ( s ) = 1 , padding = 0 kernel\_size(k)=3,stride(s)=1,padding=0kernel_size(k)=3,stride(s)=1,padding=0 .
Output feature map (green):( H out , W out ) = ( 2 , 2 ) (H_{out},W_{out})=(2,2)(Hout,Wout)=(2,2)。
Substitution formula ( 1 ) Formula (1)公式(1)中,可得:
H o u t = H i n + 2 p − K S + 1 H o u t = 4 + 2 ∗ 0 − 3 1 + 1 = 2 H_{out}=\frac{H_{in}+2p-K}S+1\\ H_{out}=\frac{4+2*0-3}1+1=2 Hout=SHin+2p−K+1Hout=14+2∗0−3+1=2
2.2 When step size stride=2,p=1,k=3
Input feature map (blue): ( H in , W in ) = ( 5 , 5 ) (H_{in},W_{in})=(5,5)(Hin,Win)=(5,5 )
Definitely:kernel_ size ( k ) = 3 , stride ( s ) = 2 , padding = 1 kernel\_size(k)=3,stride(s)=2,padding=1kernel_size(k)=3,stride(s)=2,padding=1 .
Output feature map (green):( H out , W out ) = ( 3 , 3 ) (H_{out},W_{out})=(3,3)(Hout,Wout)=(3,3)。
Substitution formula ( 1 ) Formula (1)公式(1)中,可得:
H o u t = H i n + 2 p − k s + 1 H o u t = 5 + 2 ∗ 1 − 3 2 + 1 = 3 H_{out}=\frac{H_{in}+2p-k}s+1\\ H_{out}=\frac{5+2*1-3}2+1=3 Hout=sHin+2p−k+1Hout=25+2∗1−3+1=3
3. Transpose convolution (Conv2DTranspose)
1 Introduction
For many generative models (such as semantic segmentation, autoencoders, generators in GANs, etc.), we often want to perform the opposite transformation from standard convolution, that is, perform upsampling. For semantic segmentation, an encoder is first used to extract feature maps, and then a decoder is used to restore the original image size, thus classifying each pixel of the original image.
Traditional methods of achieving upsampling are to apply interpolation schemes or manually create rules. Modern architectures such as neural networks tend to allow the network to automatically learn appropriate transformations without human intervention. To do this we can use transposed convolution.
2. Misunderstanding of the name of transposed convolution
This operation is sometimes called “deconvolution” after (Zeiler et al., 2010), but is really the transpose (gradient) of atrous_conv2d rather than an actual deconvolution.
Deconvolutional Networks: Zeiler et al., 2010 (pdf)
Transposed convolution is also called deconvolution ( deconv
or deconvolution
) and deconvolution. However, transposed convolution is currently the most formal and mainstream name, because this name more appropriately describes Conv2DTranspose
the calculation process of , while other names are easily misleading. In mainstream deep learning frameworks, such as TensorFlow, Pytorch, and Keras, the function names are all conv_transpose
. Therefore, before learning transposed convolution, we must clarify the standard name. When others talk about deconvolution and deconvolution, we must also help them correct it, so that incorrect naming will be submerged in the long river of history as soon as possible.
Expressed as a formula, the standard convolution operation can be expressed as: y = C xy=Cxy=C x , whereCCC convolution matrix,xxx is the input matrix. . If it is an inverse convolution based on a mathematical relationship, it should be expressed as:x = C − 1 yx=C^{-1}yx=C− 1 y, but the real relationship of deconvolution should be:x = CT yx=C^Tyx=CT y. It can be proved by the formula: the so-called "deconvolution" is more accurately called "transposed convolution".
Let's first talk about why people like to call transposed convolution as deconvolution or deconvolution. First, let's give an example. If a 4x4 input is passed through a 3x3 convolution kernel and then subjected to ordinary convolution (no padding, stride=1), a 2x2 output will be obtained. The transposed convolution passes a 2x2 input through a convolution kernel of the same 3x3 size to obtain a 4x4 output, which seems to be the inverse process of ordinary convolution. Just like the inverse process of addition is subtraction and the inverse process of multiplication is division, people naturally think that these two operations seem to be a reversible process. Transposed convolution is not the inverse operation of convolution (general convolution operations are irreversible), transposed convolution is also a convolution.Transpose convolution is not the complete inverse process (inverse operation) of forward convolution. It cannot completely restore the data of the input matrix, but can only restore the size (shape) of the input matrix.. Therefore, the name of transposed convolution comes from this, rather than "deconvolution" or "inverse convolution". Bad names can easily misunderstand people.
In some literature, transposed convolution is also called fractional stride convolution (convolution with fractional strides) or deconvolution (Deconvolution) or backwards strided convolution (backwards strided convolution), but this Deconvolution
is misleading and is not recommended. Therefore, bloggers strongly recommend using the names Conv2DTranspose
" and convolution with fractional strides
" , which correspond to the code version and the academic paper version respectively .
I think
transpose_conv2d
orconv2d_transpose
are the cleanest names.
3. Conv2DTranspose
Introduction
3.1 Conv2DTranspose
Concept
Transposed Convolution is a special kind of forward convolution. It first expands the size of the input image by padding zero elements according to a certain proportion, then rotates the convolution kernel, and then performs forward convolution. transposed convolution inSemantic segmentationorAdversarial Neural Network (GAN)It is relatively common, and its main function is to do upsampling ( UpSampling ).
3.2 Conv2D
and Conv2DTranspose
comparison
There is a big difference between transposed convolution and standard convolution. Direct convolution uses a "small window" to see a "big world", while transposed convolution uses a part of a "big window" to see the "small world". world".
In standard convolution (large image becomes small image), the input is (5,5), the step size is (2,2) , and the output is (3,3).
In the transpose convolution operation (small image becomes larger image), input (3,3) output (5,5).
3.3 Use matrix multiplication to describe transposed convolution
For the input element matrix X and the output element matrix Y, use matrix operations to describeStandard convolutionCalculation process:
Y = CXY=CXY=The operation of CX
transpose convolution is to perform the inverse operation of this matrix operation process, that is, throughCCC和YYY getsXXX , according to the size of each matrix, we can easily gettransposed convolutionThe calculation process:
X = CTYX=C^{T}YX=CIf T Y
is substituted into numerical calculations, we will find that,The operation of transposed convolution just restores the matrix XXThe size of X cannot restoreXXEach element value of X. The proof of this conclusion is introduced later.
3.4 Mathematical derivation of transposed convolution
Transpose Convolution (Transpose Convolution)
peels off the cocoon and helps you understand transpose convolution (deconvolution)
4×4
Define an input matrix of size input
:
input = [ x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 16 ] \left.input=\ left[\begin{array}{cccc}x_1&x_2&x_3&x_4\\x_5&x_6&x_7&x_8\\x_9&x_{10}&x_{11}&x_{12}\\x_{13}&x_{14}&x_{15}&x_{16}\end{array }\right.\right]input=
x1x5x9x13x2x6x10x14x3x7x11x15x4x8x12x16
A 3×3
standard convolution kernel of size kernel
:
kernel = [ w 0 , 0 w 0 , 1 w 0 , 2 w 1 , 0 w 1 , 1 w 1 , 2 w 2 , 0 w 2 , 1 w 2 , 2 ] kernel =\begin{bmatrix}w_{0,0}&w_{0,1}&w_{0,2}\\w_{1,0}&w_{1,1}&w_{1,2}\\w_{2, 0}&w_{2,1}&w_{2,2}\end{bmatrix}kernel=
w0,0w1,0w2,0w0,1w1,1w2,1w0,2w1,2w2,2
Let strides = 1 strides=1strides=1 ,padding = 0 padding=0padding=0 ,即 i = 4 , k = 3 , s = 1 , p = 0 i=4,k=3,s=1,p=0 i=4,k=3,s=1,p=0 , then according toformula (1) formula (1)Formula ( 1 ) calculates the output matrixoutput outputoutput :
o u t p u t = [ y 0 y 1 y 2 y 3 ] output=\begin{bmatrix}y_0&y_1\\y_2&y_3\end{bmatrix} output=[y0y2y1y3]
Here, we change the expression. We expand the input matrixinput
and the output matrixoutput
into column vectorsX
and column vectorsY
. Then the dimensions of the vectorsX
and vectorsandY
respectively, which can be expressed by the following formulas:16×1
4×1
Expand the input matrix input
into a 16×1
column vector XXX:
X = [ x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 16 ] T \begin{array}{llllllllllll}X=[x_{1}&x_{2}&x_{3}&x_{4}&x_{5}&x_{6}&x_{7}&x_{8}&x_{9}&x_{10}&x_{11}&x_{12}&x_{13}&x_{14}&x_{15}&x_{16}]^T\end{array} X=[x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16]T
Put the output matrix output outputo u tp u t expands into a4×1
column vectorYYY:
Y = [ y 1 y 2 y 3 y 4 ] T Y=\begin{bmatrix}y_1&y_2&y_3&y_4\end{bmatrix}^T Y=[y1y2y3y4]T
then uses matrix operations to describe standard convolution operations. Here, matrix is usedC
to represent the standard convolution kernel matrix:
Y = CXY=CXY=CX
guidance, the dimensions of this rare C
square4×16
:
C = [ u 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 3 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 ] C=\begin{bmatrix}u_{0 ,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2,0}&w_{2,1}&w_{2,3 }&0&0&0&0&0\\0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2,0}&w_{2, 1}&w_{2,2}&0&0&0&0\\0&0&0&0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2 ,0}&w_{2,1}&w_{2,2}&0\\0&0&0&0&0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{ 1,2}&0&w_{2,0}&w_{2,1}&w_{2,2}\end{bmatrix}C=
u0,0000w0,1w0,000w0,2w0,1000w0,200w1,00w0,00w1,1w1,0w0,1w0,0w1,2w1,1w0,2w0,10w1,20w0,2w2,00w1,00w2,1w2,0w1,1w1,0w2,3w2,1w1,2w1,10w2,20w1,200w2,0000w2,1w2,000w2,2w2,1000w2,2
The above matrix operation is shown in the figure below:
For transposed convolution, it is actually the inverse operation of forward convolution, that is, through CCC和YYY getsXXX :
X = C T Y X=C^{T}Y X=CT Y
At this time, the new sparse matrix becomesCTC^T16×4
withCT , the following figure visually shows the convolution operation after transposition. Here, the weight matrix used for transposed convolution does not necessarily come from the original standard convolution kernel matrix, but the shape of the weight matrix is the same as the transposed convolution kernel matrix.
Then 16×1
by reordering the output results of , we can 2×2
get an 4×4
output matrix of size through the input matrix of size.
Note: If you plug in numbers to calculate, you will find thatThe operation of transposed convolution just restores the matrix XXThe size of X cannot restoreXXEach element value of X. The proof of this conclusion is introduced later.
4. (PyTorch version) Conv2DTranspose
Calculation process
The size of the transposed convolution kernel is kernel(k), the step size is stride(s), and padding§ is used. Then the calculation steps of the transposed convolution can be summarized into three steps:
- The first step: calculate the new input feature map;
- Step 2: Calculate the transposed convolution kernel;
- Step 3: Perform standard convolution operations.
4.1 Step 1: Calculate new input feature map
For the input feature map MMM proceedinterpolationZero elements, get the new input feature map M ′ M^{\prime}M′ 。
Take the height of the feature map Height as an example, the Height of the input feature map is H in H_{in}Hin, there is (H in − 1) (H_{in}-1) in the middle(Hin−1 ) gap.
The number of interpolated zero elements between two adjacent positions: s − 1 s-1s−1, s s s represents the step size.
The total number of interpolated zero elements in the Height direction: ( H in − 1 ) ∗ ( s − 1 ) (H_{in}-1) * (s-1)(Hin−1)∗(s−1 ) .
New input feature map size:H in ′ = H in + ( H in − 1 ) ∗ ( s − 1 ) H_{in}^{\prime} = H_{in} + (H_{in}-1)* (s-1)Hin′=Hin+(Hin−1)∗(s−1)。
4.2 Step 2: Calculate the transposed convolution kernel
For the standard convolution kernel KKK is flipped up and down and left and right to obtain the transposed convolution kernelK ′ K^{\prime}K′。
Known:
standard convolution kernel size: kkk ,
standard convolution kernel stride:sss ,
standard convolution kernel padding:ppp
- Transposed convolution kernel size: k ′ = kk^{\prime}=kk′=k。
- Transposed convolution kernel stride: s ′ = 1 s^{\prime}=1s′=1,This value is always 1。
- Transposed convolution kernel padding: p ′ = k − p − 1 p^{\prime} = kp-1p′=k−p−1。How this formula came about is explained below.
4.3 Step 3: Perform standard convolution operation
Use the transposed convolution kernel to perform a standard convolution operation on the new input feature map , and the result obtained is the result of the transposed convolution.
According to the calculation formula of standard convolution:
H out = ( H in ′ + 2 p ′ − k ′ ) s ′ + 1 ( 2 ) \mathrm{H_{out}}=\frac{(\mathrm{H_{in }^{\prime}}+2\mathrm{p^{\prime}}-k^{\prime})}{\mathrm{s^{\prime}}}+1\quad(2)Hout=s′(Hin′+2p′−k′)+1(2)
H ′ = H i n + ( H i n − 1 ) ∗ ( s − 1 ) H^{\prime} = H_{in} + (H_{in}-1)*(s-1) H′=Hin+(Hin−1)∗(s−1 ),
k ′ = kk^{\prime}=kk′=k,
s ′ = 1 s^{\prime}=1 s′=1,
p ′ = k − p − 1 p^{\prime} = k-p-1 p′=k−p−1。
Substituting the transformation results in the first and second steps into the above equation, we can get:
H out = ( H in + H in ∗ s − H − s + 1 ) + 2 ∗ ( k − p − 1 ) − ks ′ + 1 (3) \text{H}_{out}=\frac{(\text{H}_{in}+\text{H}_{in}*s-\text{H}-\text{s} +1)+2*(\text{k}-\text{p}-1)-\text{k}}{\text{s}'}+1\quad(3)Hout=s′(Hin+Hin∗s−H−s+1)+2∗(k−p−1)−k+1(3)
化简,可得:
H o u t = ( H i n − 1 )*s + k − 2 p − 1 s ′ + 1 ( 4 ) \text{H}_{out}=\frac{(\text{H}_{in}-1\text{)*s}+\text{k}-2\text{p}-1}{\text{s}'}+1\quad(4) Hout=s′(Hin−1)*s+k−2p−1+1( 4 )
In the above formula, the denominator step sizes ′ = 1 s^{\prime}=1s′=1 , then the final result is:
H out = ( H in − 1 ) ∗ s − 2 p + k ( 5.1 ) \mathrm{H}_{out}=(\mathrm{H}_{in}-1)* \text{s}-2\text{p}+\mathrm{k}\quad(5.1)Hout=(Hin−1)∗s−2p+k(5.1)
To sum up, the results of the transposed convolution calculation in the two directions of the feature map Height and Width can be obtained:
H out = ( H in − 1 ) × stride [ 0 ] − 2 × padding [ 0 ] + kernel _ size [ 0 ] W out = ( W in − 1 ) × stride [ 1 ] − 2 × padding [ 1 ] + kernel _ size [ 1 ] H_{out}=(H_{in}−1)×stride[0] −2×padding[0]+kernel\_size[0]\\ W_{out}=(W_{in}−1)×stride[1]−2×padding[1]+kernel\_size[1]Hout=(Hin−1)×stride[0]−2×padding[0]+kernel_size[0]Wout=(Win−1)×stride[1]−2×padding[1]+kernel_size[1]
4.4 Prove that p ′ = k − p − 1 p^{\prime}=kp-1p′=k−p−1
Transformation formula (5.1) formula (5.1)Formula ( 5.1 ) can be obtained:
H in = H out + 2 p − ks + 1 (5.2) H_{in}=\frac{H_{out}+2p-k}s+1\quad(5.2)Hin=sHout+2p−k+1(5.2)
From formula (5.2) formula (5.2)Formula ( 5.2 ) andFormula (1) Formula (1)It can be seen from formula ( 1 )Conv2D
that andConv2DTranspose
in the input and output shape sizesinverses。
Note: torch.nn.ConvTranspose2d
The padding argument effectively addsdilation * (kernel_size - 1) - padding
amount of zero padding to both sizes of the input. This is set so that when a Conv2d and a ConvTranspose2d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes.The parameter padding effectively dilation ∗ (kernel _ size − 1 ) − padding dilation * (kernel\_size - 1) - paddingdilation∗(kernel_size−1)−p a dd in g Zero-padded padding is added to inputs of both sizes. This is set so that when Conv2d and ConvTranspose2d are initialized with the same parameters, their input and output shape sizesmutually inverse.
To simply understand, the function of parameter padding is to makeConv2d
and input andConvTranspose2d
outputShapes and sizes are inverse to each other。
In the second step, p ′ = k − p − 1 p^{\prime} = kp-1p′=k−p−1How is the calculation formula generated? In fact, it is derived (reverse)based on the condition that "Conv2d
andConvTranspose2d
. It can be simply proved:
Known conditions :
H ′ = H in + ( H in − 1 ) ∗ ( s − 1 ) H^{\prime} = H_{in} + (H_{in}-1)*(s-1)H′=Hin+(Hin−1)∗(s−1 ),
k ′ = kk^{\prime}=kk′=k,
s ′ = 1 s^{\prime}=1 s′=1,
p ′ p^{\prime} p' Unknown to be requested.
Substitute the known conditions into formula (2) formula (2)公式(2) 中,可得:
H o u t = ( H i n + H i n ∗ s − H − s + 1 ) + 2 ∗ p ′ − k s ′ + 1 \text{H}_{out}=\frac{(\text{H}_{in}+\text{H}_{in}*s-\text{H}-\text{s}+1)+2*p^{\prime}-\text{k}}{\text{s}'}+1 Hout=s′(Hin+Hin∗s−H−s+1)+2∗p′−k+1
Simplify, we can get:
H out = ( H in − 1 ) ∗ s + 2 ∗ p ′ − k + 2 ( 6 ) \mathrm{H}_{out}=(\mathrm{H}_{in} -1)*\text{s}+2*p^{\prime}-\mathrm{k}+2\quad(6)Hout=(Hin−1)∗s+2∗p′−k+2( 6 )
According to "Conv2d
andConvTranspose2d
output are inverse to each other", we can get:
H in = ( H out − 1 ) ∗ s + 2 ∗ p ′ − k + 2 ( 7 ) \mathrm{H}_{in }=(\mathrm{H}_{out}-1)*\text{s}+2*p^{\prime}-\mathrm{k}+2\quad(7)Hin=(Hout−1)∗s+2∗p′−k+2( 7 )
The transformation formula can be obtained:
H out = ( H in − 2 ∗ p ′ + k − 2 ) s + 1 ( 8 ) \mathrm{H_{out}}=\frac{(\mathrm{H_{in} }-2*\mathrm{p^{\prime}}+\mathrm{k}-2)}{\mathrm{s}}+1\quad(8)Hout=s(Hin−2∗p′+k−2)+1( 8 )
is given byformula (8) formula (8)Formula ( 8 ) andFormula (1) Formula (1)Formula ( 1 ) can be obtained:
2 p − k = − 2 ∗ p ′ + k − 2 2p-k=-2*p^{\prime}+k-22p−k=−2∗p′+k−2Solution
:
p ′ = k − p − 1 ( 9 ) p^{\prime}=kp-1\quad(9)p′=k−p−1(9)
Certificate closed.
4.5 Conv2DTranspose
Example
Input feature map MMM : H i n = 3 H_{in}=3 Hin=3 .
Standard convolution kernelKKK: k = 3 , s = 2 , p = 1 k=3,s=2, p=1 k=3,s=2,p=1 .
New input feature mapM ′ M^{\prime}M′ : H i n ′ = 3 + ( 3 − 1 ) ∗ ( 2 − 1 ) = 3 + 2 = 5 H_{in}^{\prime}=3+(3−1)∗(2−1)=3+2=5 Hin′=3+(3−1)∗(2−1)=3+2=5 . Note that after adding padding, it is 7.
Transposed convolution kernelK ′ K^{\prime}K′: k ′ = k , s ′ = 1 , p ′ = 3 − 1 − 1 = 1 k^{\prime}=k,s^{\prime}=1,p^{\prime}=3−1−1=1 k′=k,s′=1,p′=3−1−1=1 .
The final result of the transposed convolution calculation:H out = ( 3 − 1 ) ∗ 2 − 2 ∗ 1 + 3 = 5 \mathrm{H_{out}}=(3-1)*2-2*1+3= 5Hout=(3−1)∗2−2∗1+3=5。
5. (TensorFlow version) Conv2DTranspose
calculation process
To calculate the TensorFlow version Conv2DTranspose
, you first need to construct output_shape. The calculation formula of the output size is:
o = s ( i − 1 ) + a + k − 2 p , a ∈ { 0 , … , s − 1 } o=s(i-1) +a+k-2p,a\in\{0,\ldots,s-1\}o=s(i−1)+a+k−2p,a∈{
0,…,s−1 }
This formula is actually the inverse operation of the convolution output size.The only reason why it is not the same is that the operation of rounding down makes the output size determined by the input of the convolution operation.
5.1 Step 1: Calculate new input feature map
Consistent with PyTorch.
5.2 Step 2: Calculate the transposed convolution kernel
Consistent with PyTorch.
5.3 Step 3: Perform standard convolution operation
The reason why the third step is different from PyTorch is that TensorFlow's padding algorithm is different from PyTorch, resulting in different outputs when performing standard convolution operations.
For an introduction to TensorFlow's padding algorithm, please refer to the blog: Understanding TensorFlow's padding algorithm in simple terms .
Take the height of the feature map Height as an example, TensorFlow’s transposed convolution calculation formula is divided into two situations:
-
当 ( H o u t + 2 p − k ) % s = 0 (\mathrm H_{out}+2p-k)\%s=0 (Hout+2p−k)%s=When 0 , the transposed convolution formula is:
H out = ( H in − 1 ) ∗ s − 2 p + k ( 10 ) \mathrm H_{out}=(\mathrm H_{in}-1)*\mathrm s -2p+\mathrm k \quad(10)Hout=(Hin−1)∗s−2p+k(10)As shown in the figure above, we select an input input inputin p u t size is 3×3, convolution kernelkernelk er n e l size is 3×3, strides= 2 strides=2strides=2 ,padding = 1 padding=1padding=1,即 i = 3 , k = 3 , s = 2 , p = 1 i=3,k=3,s=2,p=1 i=3,k=3,s=2,p=1 , thenoutput outputThe size of o u tp u t iso = ( 3 − 1 ) x 2 − 2 + 3 = 5 o=(3−1)x2−2+3=5o=(3−1 ) x 2−2+3=5 。
-
当 ( H o u t + 2 p − k ) % s ≠ 0 (\mathrm H_{out}+2p-k)\%s\neq0 (Hout+2p−k)%s=When 0 , the calculation formula of transposed convolution is:
H o u t = ( H i n − 1 ) ∗ s − 2 p + k + ( H _ o u t + 2 p − k ) % s ( 11 ) \mathrm H_{out}=(\mathrm H_{in}-1)*\mathrm s-2p+\mathrm k+(H\_out+2p-k)\%s \quad(11) Hout=(Hin−1)∗s−2p+k+(H_out+2p−k)%s(11)
As shown in the figure above, we select an input input inputThe size of in p u t is 3×3, convolution kernelkernelThe size of k er n e l is 3×3, and the stride lengthis strides = 2 strides=2strides=2 ,padding = 1 padding=1padding=1 ,即 i = 3 , k = 3 , s = 2 , p = 1 i=3,k=3,s=2,p=1 i=3,k=3,s=2,p=1 , thenoutput outputThe dimensions of o u tp u t are o = ( 3 − 1 ) x 2 − 2 + 3 + 1 = 6 o=(3−1)x2−2+3+1=6o=(3−1 ) x 2−2+3+1=6。
In the above formula, 2 p = p _ top + p _ bottom 2p=p\_top+p\_bottom2p=p_top+p_b o tt o m ,p_top and p_bottom p\_top and p \ _bottomp _ t o p and p _ b o tt o m respectively represent the padding at the top and bottom in the Height direction.
Usually, it is known that H out H_{out}Hout, or stride = 1 stride=1stride=1 , the relevant parameters (p, H_outp, H\_outp,H_out)。
6. Transpose convolution can only restore the size, not the value.
Standard convolution operation:
import tensorflow as tf
value = tf.reshape(tf.constant([[1., 2., 3., 4., 5.],
[6., 7., 8., 9., 10.],
[11., 12., 13., 14., 15.],
[16., 17., 18., 19., 20.],
[21., 22., 23., 24., 25.]]), [1, 5, 5, 1])
filter = tf.reshape(tf.constant([[1., 0.],
[0., 1.]]), [2, 2, 1, 1])
output = tf.nn.conv2d(value, filter, [1, 2, 2, 1], 'SAME')
print(output)
"""
tf.Tensor(
[[[[ 8.]
[12.]
[ 5.]]
[[28.]
[32.]
[15.]]
[[21.]
[23.]
[25.]]]], shape=(1, 3, 3, 1), dtype=float32)
"""
The result of standard convolution is:
output = [ 8 12 5 28 32 15 21 23 25 ] output=\begin{bmatrix}8&12&5\\28&32&15\\21&23&25\end{bmatrix}output=
8282112322351525
We perform a transposed convolution operation on this result using exactly the same parameters as a standard convolution operation:
input = tf.reshape(tf.constant([[8., 12., 5.],
[28., 32., 15.],
[21., 23., 25.]]), [1, 3, 3, 1])
kernel = tf.reshape(tf.constant([[1., 0.],
[0., 1.]]), [2, 2, 1, 1])
output = tf.nn.conv2d_transpose(input=input,
filters=kernel,
output_shape=[1, 5, 5, 1],
strides=[1, 2, 2, 1],
padding='SAME')
print(output)
"""
tf.Tensor(
[[[[ 8.]
[ 0.]
[12.]
[ 0.]
[ 5.]]
[[ 0.]
[ 8.]
[ 0.]
[12.]
[ 0.]]
[[28.]
[ 0.]
[32.]
[ 0.]
[15.]]
[[ 0.]
[28.]
[ 0.]
[32.]
[ 0.]]
[[21.]
[ 0.]
[23.]
[ 0.]
[25.]]]], shape=(1, 5, 5, 1), dtype=float32)
"""
The result of transposed convolution is:
output = [ 8 0 12 0 5 0 8 0 12 0 28 0 32 0 15 0 28 0 32 0 21 0 23 0 25 ] output=\begin{bmatrix}8&0&12&0&5\\0&8&0&12&0\\ 28&0&32&0&15\\0&28&0&32&0\\21&0&23&0&25\end{bmatrix}output=
80280210802801203202301203205015025
It can be seen from this thatTransposed convolution can only restore dimensions, not values.。
7. Conv2DTranspose
The step size in stride
The following figure shows the situation of different s and p in transposed convolution:
s=1, p=0, k=3 | s=2, p=0, k=3 | s=2, p=1, k=3 |
7.1 When step size stride=1,p=0,k=3
Input feature map (blue): ( H in , W in ) = ( 2 , 2 ) (H_{in},W_{in})=(2,2)(Hin,Win)=(2,2 )。
Determine the parameters:kernel _ size ( k ) = 3 , step ( s ) = 1 , padding ( p ) = 0 kernel\_size( k ) = 3 , stride ( s ) = 1 , padding ( p ) . =0kernel_size(k)=3,stride(s)=1,padding(p)=0 .
New input feature map:H in ′ = 2 + ( 2 − 1 ) ∗ ( 1 − 1 ) = 2 H_{in}^{\prime} =2+(2-1)*(1-1)=2Hin′=2+(2−1)∗(1−1)=2 . As shown in the figure above, the new input feature map obtained after interpolation transformation is (2,2).
Transposed convolution kernel:kernel _ size ( k ′ ) = 3 , stride ( s ′ ) = 1 , padding ( p ′ ) = 3 − 0 − 1 = 2 kernel\_size(k^{\prime})=3 ,stride(s^{\prime})=1, padding(p^{\prime})=3-0-1=2kernel_size(k′)=3,stride(s′)=1,padding(p′)=3−0−1=2 . As shown in the figure above, the padding is 2.
Output feature map (green):( H out , W out ) = ( 4 , 4 ) (H_{out},W_{out})=(4,4)(Hout,Wout)=(4,4)。
Substitution formula ( 5 ) Formula (5)公式(5)中,可得:
H o u t = ( H i n − 1 ) ∗ s − 2 p + k H o u t = ( 2 − 1 ) ∗ 1 − 2 ∗ 0 + 3 = 4 \mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(2-1)*1-2*0+3=4 Hout=(Hin−1)∗s−2p+kHout=(2−1)∗1−2∗0+3=4
7.2 When step size stride=2,p=0,k=3
Input feature map (blue): ( H in , W in ) = ( 2 , 2 ) (H_{in},W_{in})=(2,2)(Hin,Win)=(2,2)。
卷积核: k = 3 , s t r i d e ( s ) = 2 , p a d d i n g = 0 k=3,stride(s)=2, padding=0 k=3,stride(s)=2,padding=0 .
New input feature map:H in ′ = 2 + ( 2 − 1 ) ∗ ( 2 − 1 ) = 3 H_{in}^{\prime} =2+(2-1)*(2-1)=3Hin′=2+(2−1)∗(2−1)=3 . As shown in the figure above, the new input feature map obtained after interpolation transformation is (3,3).
Transposed convolution kernel:kernel _ size ( k ′ ) = 3 , stride ( s ′ ) = 1 , padding ( p ′ ) = 3 − 0 − 1 = 2 kernel\_size(k^{\prime})=3 ,stride(s^{\prime})=1, padding(p^{\prime})=3-0-1=2kernel_size(k′)=3,stride(s′)=1,padding(p′)=3−0−1=2 . As shown in the figure above, the padding is 2.
Output feature map (green):( H out , W out ) = ( 5 , 5 ) (H_{out},W_{out})=(5,5)(Hout,Wout)=(5,5)。
Substitution formula ( 5 ) Formula (5)公式(5)中,可得:
H o u t = ( H i n − 1 ) ∗ s − 2 ∗ p + k H o u t = ( 2 − 1 ) ∗ 2 − 2 ∗ 0 + 3 = 5 \mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2*\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(2-1)*2-2*0+3=5 Hout=(Hin−1)∗s−2∗p+kHout=(2−1)∗2−2∗0+3=5
7.3 When step size stride=2,p=1,k=3
Input feature map (blue): ( H in , W in ) = ( 3 , 3 ) (H_{in},W_{in})=(3,3)(Hin,Win)=(3,3)。
卷积核: k = 3 , s t r i d e ( s ) = 2 , p a d d i n g = 1 k=3,stride(s)=2, padding=1 k=3,stride(s)=2,padding=1 .
New input feature map:H in ′ = 3 + ( 3 − 1 ) ∗ ( 2 − 1 ) = 5 H_{in}^{\prime} =3+(3-1)*(2-1)=5Hin′=3+(3−1)∗(2−1)=5 . As shown in the figure above, the new input feature map obtained after interpolation transformation is (5,5).
Transposed convolution kernel:kernel _ size ( k ′ ) = 3 , stride ( s ′ ) = 1 , padding ( p ′ ) = 3 − 1 − 1 = 1 kernel\_size(k^{\prime})=3 ,stride(s^{\prime})=1, padding(p^{\prime})=3-1-1=1kernel_size(k′)=3,stride(s′)=1,padding(p′)=3−1−1=1 . As shown in the figure above, the padding is 1.
Output feature map (green):( H out , W out ) = ( 5 , 5 ) (H_{out},W_{out})=(5,5)(Hout,Wout)=(5,5)。
Substitution formula ( 5 ) Formula (5)公式(5)中,可得:
H o u t = ( H i n − 1 ) ∗ s − 2 ∗ p + k H o u t = ( 3 − 1 ) ∗ 2 − 2 ∗ 1 + 3 = 5 \mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2*\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(3-1)*2-2*1+3=5 Hout=(Hin−1)∗s−2∗p+kHout=(3−1)∗2−2∗1+3=5
8. Checkerboard Artifacts
Checkerboard Artifacts (Checkerboard Artifacts)
Convolution Operation Summary (3) - Causes and Solutions of Transposed Convolution Checkerboard Artifacts
Deconvolution and Checkerboard Artifacts
The checkerboard effect is the result of the "uneven overlap" of transposed convolution, which makes certain parts of the image darker than other parts.
9. Summary
Conv2D
, feature map transformation:
H o u t = H i n + 2 p − k s + 1 H_{out}=\frac{H_{in}+2p-k}s+1 Hout=sHin+2p−k+1
Conv2DTranspose
, feature map transformation:
H o u t = ( H i n − 1 ) ∗ s − 2 p + k \mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2\text{p}+\mathrm{k} Hout=(Hin−1)∗s−2p+k
3.Standard convolution kernel:s, p, ks,p,ks,p,k ;Transposed convolution kernel:s = 1, p ′ = k − p − 1 , k ′ = ks=1,p^{\prime}=kp-1,k^{\prime}=ks=1,p′=k−p−1,k′=k .
4. The stride in the first step determines the interpolation filling (the number of zero elements). The expansion multiple is related to strides. The expansion method is to insertstrides -1
0 between elements. The third stepstride=1
will never change.
5.Conv2D
AndConv2DTranspose
input and outputShapes and sizes are inverse to each other.
6. Standard convolution (large image becomes small image, (5,5) to (3,3)), transposed convolution (small image becomes large image, (3,3) to (5,5)).
4. Relevant experience
1. (loud)tf.keras.layers.Conv2DTranspose
TF official document: tf.keras.layers.Conv2DTranspose
TensorFlow function: tf.layers.Conv2DTranspose
The implementation mechanism and special case handling method of transposed convolution conv2d_transpose in tensorflow
function prototype
Take TensorFlow v2.14.0
the version as an example to introduce transposed convolution.
tf.keras.layers.Conv2DTranspose(
filters,
kernel_size,
strides=(1, 1),
padding='valid',
output_padding=None,
data_format=None,
dilation_rate=(1, 1),
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
Parameter explanation
filters
: Integer, the dimension of the output (that is, the number of output channels).kernel_size
: A tuple or a list of 2 positive integers, specifying the spatial dimensions of filters.strides
: A tuple or list of 2 positive integers specifying the stride of the convolution.padding
: A string, "valid" or "same", filling algorithm.output_padding
: A tuple or a list of 2 positive integers, specifying the padding in the height and width directions of the output tensor. If set to "None" (default), the output shape is automatically inferred.data_format
: A string, which can be onechannels_last
(default) orchannels_first
, indicating the order of input dimensions.channels_last
corresponds to an input with shape (batch, height, width, channels), whilechannels_first
corresponds to an input with shape (batch, channels, height, width).dilation_rate
: Integer, specifying the expansion rate of all spatial dimensions of the dilation convolution.activation
: Specify the activation function. If set to "None" (default), the activation function will not be used.use_bias
: Boolean, indicating whether this layer uses bias.kernel_initializer
: Initializer of the convolution kernel weight matrix (seekeras.initializers
), the default is "glorot_uniform".bias_initializer
: Initializer for bias vector, default is "zeros".kernel_regularizer
: Regularizer applied to the convolution kernel weight matrix (seekeras.regularizers
).bias_regularizer
: Regularizer applied to the bias vector.activity_regularizer
: Regularizer applied to the output of the activation layer.kernel_constraint
: The constraint function applied to the convolution kernel.bias_constraint
: Constraint function applied to the bias vector.
Input shape
4D tensor with shape:
(batch_size, channels, rows, cols)
if data_format=channels_first
or 4D tensor with shape:(batch_size, rows, cols, channels)
if data_format=channels_last
.
Output shape
4D tensor with shape:
(batch_size, filters, new_rows, new_cols)
if data_format=channels_first
or 4D tensor with shape:(batch_size, new_rows, new_cols, filters)
if data_format=channels_last
.
new_rows = ((rows - 1) * strides[0] + kernel_size[0] - 2 * padding[0] +
output_padding[0])
new_cols = ((cols - 1) * strides[1] + kernel_size[1] - 2 * padding[1] +
output_padding[1])
2. (TensorFlow)tf.nn.conv2d_transpose
TF official document: tf.nn.conv2d_transpose
tf.nn.conv2d_transpose
is sometimes called “deconvolution” after (Zeiler et al., 2010), but is really the transpose (gradient) of conv2d
rather than an actual deconvolution.
Core source code
View tf.nn.conv2d_transpose
the source code located at nn_ops.py#L2689-L2773 . After some searching, we found that the function points to nn_ops.py#L2607 :
# 卷积操作的逆运算(反向推导),已知output,计算input
# https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/ops/nn_ops.py#L2547-L2609
@tf_export(v1=["nn.conv2d_backprop_input"])
@dispatch.add_dispatch_support
def conv2d_backprop_input( # pylint: disable=redefined-builtin,dangerous-default-value
input_sizes,
filter=None,
out_backprop=None,
strides=None,
padding=None,
use_cudnn_on_gpu=True,
data_format="NHWC",
dilations=[1, 1, 1, 1],
name=None,
filters=None):
filter = deprecation.deprecated_argument_lookup( # 重命名filter,没有改变值
"filters", filters, "filter", filter)
padding, explicit_paddings = convert_padding(padding) ## 改变了padding
return gen_nn_ops.conv2d_backprop_input(
input_sizes, filter, out_backprop, strides, padding, use_cudnn_on_gpu,
explicit_paddings, data_format, dilations, name)
# conv2d与conv2d_transpose输入输出的形状大小互为逆
# https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/ops/nn_ops.py#L2689-L2773
@tf_export("nn.conv2d_transpose", v1=[])
@dispatch.add_dispatch_support
def conv2d_transpose_v2(
input, # pylint: disable=redefined-builtin
filters, # pylint: disable=redefined-builtin
output_shape,
strides,
padding="SAME",
data_format="NHWC",
dilations=None,
name=None):
with ops.name_scope(name, "conv2d_transpose",
[input, filter, output_shape]) as name:
if data_format is None:
data_format = "NHWC"
channel_index = 1 if data_format.startswith("NC") else 3
strides = _get_sequence(strides, 2, channel_index, "strides")
dilations = _get_sequence(dilations, 2, channel_index, "dilations")
padding, explicit_paddings = convert_padding(padding)
return gen_nn_ops.conv2d_backprop_input( # 卷积的反向推导
input_sizes=output_shape,
filter=filters,
out_backprop=input,
strides=strides,
padding=padding,
explicit_paddings=explicit_paddings,
data_format=data_format,
dilations=dilations,
name=name)
It can be seen from the above source code that the transposed convolution operation only changes the pading, but the filter does not change, and ultimately points to the reverse derivation part of the standard convolution.
2.1 Function prototype
tf.nn.conv2d_transpose(
input,
filters,
output_shape,
strides,
padding='SAME',
data_format='NHWC',
dilations=None,
name=None
)
Parameter explanation
- input:A 4-D
Tensor
of typefloat
and shape[batch, height, width,in_channels]
forNHWC
data format or[batch, in_channels, height,width]
forNCHW
data format. - filters:A 4-D
Tensor
with the same type asinput
and shape[height,width, output_channels, in_channels]
.filter
’sin_channels
dimension must match that ofinput
. - output_shape:A 1-D
Tensor
representing the output shape of the deconvolution op. - strides: an integer or a list of 1, 2, or 4 positive integers, specifying the stride of the convolution.
- padding:Either the
string
"SAME"
or"VALID"
indicating the type of padding algorithm to use. - data_format:A string. ‘NHWC’ and ‘NCHW’ are supported.
- dilations: an integer or a list of 1, 2, or 4 positive integers (default is 1), specifying the expansion rate of all spatial dimensions of the dilated convolution.
- name:Optional name for the returned tensor.
2.2 Code examples
import tensorflow as tf
import numpy as np
def test_conv2d_transpose():
# input batch shape = (1, 2, 2, 1) -> (batch_size, height, width, channels) - 2x2x1 image in batch of 1
x = tf.constant(np.array([[
[[1], [2]],
[[3], [4]]
]]), tf.float32)
# filters shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
f = tf.constant(np.array([
[[[1]], [[1]], [[1]]],
[[[1]], [[1]], [[1]]],
[[[1]], [[1]], [[1]]]
]), tf.float32)
conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 4, 4, 1), strides=[1, 2, 2, 1], padding='SAME')
with tf.Session() as session:
result = session.run(conv)
assert (np.array([[
[[1.0], [1.0], [3.0], [2.0]],
[[1.0], [1.0], [3.0], [2.0]],
[[4.0], [4.0], [10.0], [6.0]],
[[3.0], [3.0], [7.0], [4.0]]]]) == result).all()
2.3 Code analysis
known conditions:
# 2*2*1 ——> 4*4*1
(in_height, in_width)=(2, 2)
(filter_height, filter_width)=(3, 3)
(strides[1], strides[2])=(2, 2)
According to TensorFlow’s padding algorithm, it can be known:
in_height % strides[1] = 2%2 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(3-2, 0)=1
pad_top = pad_along_height // 2 = 1 // 2 = 0
pad_bottom = pad_along_height - pad_top = 1-0 = 1
Then find the padding in the Height direction:
(pad_top, pad_bottom)=(0, 1)
In the same way, the padding in the Width direction is obtained as:
(pad_left, pad_right)=(0, 1)
After the transposed convolution operation, the output size is doubled, that is, H out = 4 {H}_{out}=4Hout=4,则:
( H o u t + 2 p − k ) % s = ( 4 + ( 0 + 1 ) − 3 ) % 2 = 0 (\mathrm{H}_{out}+2p-k)\%s=(4+(0+1)-3)\%2=0 (Hout+2p−k)%s=(4+(0+1)−3)%2=0
Then, substitute the known conditions intoformula (10) formula (10)Formula ( 10 ) , we can get:
H_out = (2 − 1) ∗ 2 − (0 + 1) + 3 = 4 H\_out = (2-1)*2-(0+1)+3=4H_out=(2−1)∗2−(0+1)+3=4Similarly
, we get:H _ out = ( 2 − 1 ) ∗ 2 − ( 0 + 1 ) + 3 = 4 H\_out = (2-1)*2-(0+1)+3=4H_out=(2−1)∗2−(0+1)+3=4
To sum up, the output size is (4, 4, 1) (4,4,1)(4,4,1 ) , consistent with the results of code verification.
3. (PyTorch)torch.nn.ConvTranspose2d
3.1 Function prototype
CLASS torch.nn.ConvTranspose2d(in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
output_padding=0,
groups=1,
bias=True,
dilation=1,
padding_mode='zeros',
device=None,
dtype=None)
Parameter explanation
- in_channels (int) – Number of channels in the input image
- out_channels (int) – Number of channels produced by the convolution
- kernel_size (int or tuple) – Size of the convolving kernel
- stride (int or tuple, optional) – Stride of the convolution. Default: 1
- padding (int or tuple, optional) –
dilation * (kernel_size - 1) - padding
zero-padding will be added to both sides of each dimension in the input. Default: 0 - output_padding ( int or tuple , optional ) – Additional size added to one side of each dimension in the output shape. Default: 0. Note that
output_padding
is only used to find output shape, but does not actually add zero-padding to output. In calculations The obtained output feature map is filled with several rows or columns of 0 in the height and width directions (note that it is onlyone side
filled on the top, bottom and left and right sides, not both sides) - groups ( int , optional ) – Number of blocked connections from input channels to output channels. Default: 1. Parameters used only when group convolution is used. The default is 1, which is ordinary convolution.
- bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
- dilation ( int or tuple , optional ) – Spacing between kernel elements. Default: 1. This parameter is used only when dilated convolution (dilated convolution) is used. The default is 1, which is ordinary convolution.
Variables
- weight (Tensor) – the learnable weights of the module of shape ( in_channels , out_channels groups , kernel_size[0] , kernel_size[1] ) (\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}}, \text{kernel\_size[0]}, \text{kernel\_size[1]}) (in_channels,groupsout_channels,kernel_size[0],kernel_size[1]).
- bias (Tensor) – the learnable bias of the module of shape (out_channels) If
bias
isTrue
.
3.2 Code examples
The following uses the Pytorch framework to simulate s=1, p=0, k=3
the transposed convolution operation:
In the code, transposed_conv_official
the function is calculated using the official transposed convolution. transposed_conv_self
The function is the result of filling the input feature map by itself according to the steps mentioned above and passing the standard convolution.
import torch
import torch.nn as nn
def transposed_conv_official():
# 输入特征图
feature_map = torch.as_tensor([[1, 0],
[2, 1]], dtype=torch.float32).reshape([1, 1, 2, 2])
print(feature_map)
# 实例化转置卷积对象
trans_conv = nn.ConvTranspose2d(in_channels=1, out_channels=1,
kernel_size=3, stride=1, bias=False)
# 定义标准卷积核(注意是标准卷积核,而不是转置卷积核)
trans_conv.load_state_dict({
"weight": torch.as_tensor([[1, 0, 1],
[0, 1, 1],
[1, 0, 0]], dtype=torch.float32).reshape([1, 1, 3, 3])})
print(trans_conv.weight)
# 执行转置卷积操作
output = trans_conv(feature_map)
print(output)
def transposed_conv_self():
# 新的输入特征图
feature_map = torch.as_tensor([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 2, 1, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=torch.float32).reshape([1, 1, 6, 6])
print(feature_map)
# 实例化标准卷积
conv = nn.Conv2d(in_channels=1, out_channels=1,
kernel_size=3, stride=1, bias=False)
# 由标准卷积核进行上下、左右翻转,得到转置卷积核
conv.load_state_dict({
"weight": torch.as_tensor([[0, 0, 1],
[1, 1, 0],
[1, 0, 1]], dtype=torch.float32).reshape([1, 1, 3, 3])})
print(conv.weight)
# 执行标准卷积操作
output = conv(feature_map)
print(output)
def main():
transposed_conv_official()
print("---------------")
transposed_conv_self()
if __name__ == '__main__':
main()
Output results
tensor([[[[1., 0.],
[2., 1.]]]])
Parameter containing:
tensor([[[[1., 0., 1.],
[0., 1., 1.],
[1., 0., 0.]]]], requires_grad=True)
tensor([[[[1., 0., 1., 0.],
[2., 2., 3., 1.],
[1., 2., 3., 1.],
[2., 1., 0., 0.]]]], grad_fn=<SlowConvTranspose2DBackward>)
---------------
tensor([[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 2., 1., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]]])
Parameter containing:
tensor([[[[0., 0., 1.],
[1., 1., 0.],
[1., 0., 1.]]]], requires_grad=True)
tensor([[[[1., 0., 1., 0.],
[2., 2., 3., 1.],
[1., 2., 3., 1.],
[2., 1., 0., 0.]]]], grad_fn=<ThnnConv2DBackward>)
Process finished with exit code 0
3.3 Code analysis
Input feature map MMM : H i n = 2 H_{in}=2 Hin=2。
Standard convolution kernel KKK: k = 3 , s = 1 , p = 0 k=3,s=1, p=0 k=3,s=1,p=0。
New input feature map M ′ M^{\prime}M′ : H i n ′ = 2 + ( 2 − 1 ) ∗ ( 1 − 1 ) = 2 H_{in}^{\prime}=2+(2−1)∗(1−1)=2 Hin′=2+(2−1)∗(1−1)=2。
Transposed convolution kernel K ′ K^{\prime}K′: k ′ = k , s ′ = 1 , p ′ = 3 − 0 − 1 = 2 k^{\prime}=k,s^{\prime}=1,p^{\prime}=3−0−1=2 k′=k,s′=1,p′=3−0−1=2。
The final result of the transposed convolution calculation: H out = ( 2 − 1 ) ∗ 1 − 2 ∗ 0 + 3 = 4 \mathrm{H_{out}}=(2-1)*1-2*0+3= 4Hout=(2−1)∗1−2∗0+3=4。
4. DCGAN
论文:Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Deep Convolutional Generative Adversarial Network
4.1 Code examples
The generator G in the DCGANS network uses tf.keras.layers.Conv2DTranspose
(upsampling) layers to generate images from seeds (random noise). Start with a layer that uses this seed as input Dense
, and upsample multiple times until you reach the desired image size of 28x28x1.
def make_generator_model():
model = tf.keras.Sequential() #创建模型实例
# 第一层须指定维度 #BATCH_SIZE无限制
model.add(layers.Dense(7*7*BATCH_SIZE, use_bias=False, input_shape=(100,))) #Desne第一层可以理解为全连接层输入,它的秩必须小于2
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
assert model.output_shape == (None, 7, 7, 256) # Note: None is the batch size
# 转化为7*7*128
model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 7, 7, 128)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
# 转化为14*14*64
model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, 14, 14, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
#转化为28*28*1
model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
assert model.output_shape == (None, 28, 28, 1)
return model
4.2 Code analysis
step1:7*7*256 ——> 7*7*128
Analyze layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False)
:
known conditions:
# 7*7*256 ——> 7*7*128
(in_height, in_width)=(7, 7)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(1, 1)
According to TensorFlow’s padding algorithm, it can be known:
in_height % strides[1] = 7%1 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(5-1, 0)=4
pad_top = pad_along_height // 2 = 4 // 2 = 2
pad_bottom = pad_along_height - pad_top = 4-2 = 2
Then find the padding in the Height direction:
(pad_top, pad_bottom)=(2, 2)
In the same way, the padding in the Width direction is obtained as:
(pad_left, pad_right)=(2, 2)
Since s = 1 s=1s=1,则 ( H o u t + 2 p − k ) % s = 0 (\mathrm{H}_{out}+2p-k)\%s=0 (Hout+2p−k)%s=0 equation is established, substitute the known conditions intoformula (10) formula (10)Formula ( 10 ) , we can get:
H _ out = ( 7 − 1 ) ∗ 1 − ( 2 + 2 ) + 5 = 7 H\_out = (7-1)*1-(2+2)+5=7H_out=(7−1)∗1−(2+2)+5=7
In the same way, we can obtain:W _ out = ( 7 − 1 ) ∗ 1 − ( 2 + 2 ) + 5 = 7 W\_out = (7-1)*1-(2+2)+5=7W_out=(7−1)∗1−(2+2)+5=7。
To sum up, the output size is (7, 7, 128) (7,7,128)(7,7,128 ) , consistent with the results of code verification.
step2:7*7*128 ——> 14*14*64
Analyze layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False)
:
known conditions:
# 7*7*128 ——> 14*14*64
(in_height, in_width)=(7, 7)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(2, 2)
According to TensorFlow’s padding algorithm, it can be known:
in_height % strides[1] = 7%2 = 1
pad_along_height = max(filter_height - (in_height % stride_height), 0)=max(5-7%2, 0)=4
pad_top = pad_along_height // 2 = 4 // 2 = 2
pad_bottom = pad_along_height - pad_top = 4-2 = 2
Then find the padding in the Height direction:
(pad_top, pad_bottom)=(2, 2)
In the same way, the padding in the Width direction is obtained as:
(pad_left, pad_right)=(2, 2)
After the transposed convolution operation, the output size is doubled, that is, H out = 14 {H}_{out}=14Hout=14,则:
( H o u t + 2 p − k ) % s = ( 14 + ( 2 + 2 ) − 5 ) % 2 = 1 (\mathrm{H}_{out}+2p-k)\%s=(14+(2+2)-5)\%2=1 (Hout+2p−k)%s=(14+(2+2)−5)%2=1Then
, substitute the known conditions intoformula (11) formula (11)Formula ( 11 ) , we can get:
H _ out = ( 7 − 1 ) ∗ 2 − ( 2 + 2 ) + 5 + ( 14 + ( 2 + 2 ) − 5 ) % 2 = 14 H\_out = (7- 1)*2-(2+2)+5+(14+(2+2)-5)\%2=14H_out=(7−1)∗2−(2+2)+5+(14+(2+2)−5)%2=14
In the same way, we can obtain:W _ out = ( 7 − 1 ) ∗ 2 − ( 2 + 2 ) + 5 + ( 14 + ( 2 + 2 ) − 5 ) % 2 = 14 W\_out = (7-1 )*2-(2+2)+5+(14+(2+2)-5)\%2=14W_out=(7−1)∗2−(2+2)+5+(14+(2+2)−5)%2=14
To sum up, the output size is (14, 14, 128) (14,14,128)(14,14,128 ) , consistent with the results of code verification.
step3:14*14*64 ——> 28*28*1
Analyze layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh')
:
known conditions:
# 14*14*64 ——> 28*28*1
(in_height, in_width)=(14, 14)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(2, 2)
According to TensorFlow’s padding algorithm, it can be known:
in_height % strides[1] = 14%2 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(5-2, 0)=3
pad_top = pad_along_height // 2 = 3 // 2 = 1
pad_bottom = pad_along_height - pad_top = 3-1 = 2
Then find the padding in the Height direction:
(pad_top, pad_bottom)=(1, 2)
In the same way, the padding in the Width direction is obtained as:
(pad_left, pad_right)=(1, 2)
After the transposed convolution operation, the output size is doubled, that is, H out = 28 {H}_{out}=28Hout=28,则:
( H o u t + 2 p − k ) % s = ( 28 + ( 1 + 2 ) − 5 ) % 2 = 0 (\mathrm{H}_{out}+2p-k)\%s=(28+(1+2)-5)\%2=0 (Hout+2p−k)%s=(28+(1+2)−5)%2=0
Then, substitute the known conditions intoformula (10) formula (10)Formula ( 10 ) , we can get:
H_out = (14 − 1) ∗ 2 − (1 + 2) + 5 = 28 H\_out = (14-1)*2-(1+2)+5=28H_out=(14−1)∗2−(1+2)+5=28
In the same way, we get:H _ out = ( 14 − 1 ) ∗ 2 − ( 1 + 2 ) + 5 = 28 H\_out = (14-1)*2-(1+2)+5=28H_out=(14−1)∗2−(1+2)+5=28
To sum up, the output size is (28, 28, 1) (28,28,1)(28,28,1 ) , consistent with the results of code verification.