Understanding of the use of transposed convolution in tensorflow

Transposed convolution is used in FCN, U-net, and GAN. Transposed convolution cannot be simply understood as deconvolution.

The forward convolution process of CNN is a process of reducing the size of the image, and the transposed convolution makes the image gradually larger until it is as large as the original image.

How to understand transposed convolution, you can start by converting the calculation of convolution into the calculation of matrix.
write picture description here
write picture description here
write picture description here

Well, here is a general understanding of what transposed convolution is and what it can do. That is to say, the small image obtained by convolution can get a larger image by transposing convolution (this analogy is not appropriate, it is good to understand).

The following is the API for transposed convolution in tensorflow

write picture description here
write picture description here

Here is the usage of the padding parameter in tf:
write picture description here

Well, finally, how to use the transposed convolution in tf:

import tensorflow as tf

# 正向卷积

 # 输入尺度  [batch, height, width, in_channels]
inputx=tf.random_normal([100,255,255,3],dtype=tf.float32)
 # kernel  [height, width, in_channels, output_channels]
w=tf.random_normal(shape=[5,5,3,10],dtype=tf.float32)
 # 卷积输出 (100, 126, 126, 10),注意:这儿用的是VALID
outputy=tf.nn.conv2d(input=inputx,filter=w,strides=[1,2,2,1],padding='VALID')

It is easy to calculate the output with convolution, but if I now want to set the size and strides of the convolution kernel according to the size of the output image? code first

# 转置卷积
# 输入的value  [batch, height, width, in_channels]
value=tf.random_normal(shape=[100,126,126,10])
# filter [height, width, output_channels, in_channels]
w=tf.random_normal(shape=[4,4,3,10])
# 转置卷积得出的结果
result=tf.nn.conv2d_transpose(value=value,filter=w,output_shape=[100,255,255,3],strides=[1,2,2,1],padding='VALID')



with tf.Session() as sess:
    tf.global_variables_initializer().run()
    # sess.run(outputy)
    # print(outputy.shape)

    sess.run(result)
    print(result.shape)

In fact, there is a very simple idea, which is to calculate the transposed convolution like a convolution. This sentence means this. For example, in the code, I want to get a picture of 255*255*3 (that is, the picture enlarged by transposed convolution). The first step is to determine what padding method we use, different The padding method determines different calculation modes. In the code, we use the VALID mode, then the input value of the transposed convolution is a picture of 126*126*10, according to the calculation formula
ceil((255kernel+1)/stride ) = 126 , where c e i l is rounded up _
Set kernel equal to 4, and stride calculates exactly 2.
If the padding is set to SAME, according to the formula
c e i l ( 255 / s t r i de)=126 , Wocao, this can't calculate the matching integer stride, which means that 126*126 cannot be transposed and convolved to 255*255. It is estimated that when the stride is 2, you can only input 128* to output 255*255 The value of 128 into the transposed convolution.

参考链接:http://blog.csdn.net/u013250416/article/details/78247818
https://www.tensorflow.org/api_docs/python/tf/nn/conv2d_transpose
https://www.tensorflow.org/api_docs/python/tf/nn/convolution
《a guide to convolution arithmetic for deep learning》

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325936522&siteId=291194637