Upsampling (deconvolution) in the image segmentation algorithm Unet

In the Unet model, the target features are first extracted through four downsampling , and then through four upsampling , and then each pixel in the feature is classified one by one, so as to achieve the purpose of semantic segmentation.

The process of downsampling is the convolutional layer of a very traditional convolutional neural network. It is first convoluted by Conv2D, then BatchNormalization is used for batch regularization, and then enters the Relu activation function layer. The core algorithm is the process of convolution, and the result is obtained by moving the convolution kernel on the image.

The upsampling process is a deconvolution process. In the early semantic segmentation tasks, the network model involves upsampling operations. The most common way is to complete upsampling by filling 0 or nearest neighbor interpolation. This method is simple and rude, but its shortcomings are also obvious. There is no way to restore the image, and the whole process cannot be learned.

In ICCV's 2015 "Learning Deconvolution Network for Semantic Segmentation" paper, a learnable deconvolution network is proposed, which no longer completes upsampling by filling 0 or nearest neighbor interpolation methods, making the whole process learnable. In the image The training of the upsampling process is implemented in the semantic segmentation network. The deconvolution mentioned in the paper should be more accurate to say transposed convolution.

The process of convolution 

   

In the 4*4 two-dimensional matrix D, using a 3*3 convolution kernel, the result is a 2*2 matrix, the result is

 |12     12|

 |10     17|

Its operation process is

(0,0) 12 0x3+1x3+2x2+2x0+2x0+0x1+0x3+1x1+2x2=12
(0,1) 12 0x3+1x2+2x1+2x0+2x1+0x3+0x1+1x2+2x2=12
(1,0) 10 0x0+1x0+2x1+2x3+2x1+0x2+0x2+1x0+2x0=10
(1,1) 17 0x0+1x1+2x3+2x1+2x2+0x2+0x0+1x0+2x2=17

The operation process is the output obtained by multiplying the corresponding position points of the convolution kernel on the matrix by linear combination.

Convolution calculation can be seen as multiplying the parameter matrix and the input matrix, Y is the output, C is the parameter matrix, and X is the input matrix

However, deconvolution is actually to multiply X by the transpose of the same parameter matrix to get Y.

 

 It should be noted here that deconvolution is only to restore the dimension of the image, not to restore the image pixels, but only to restore some features of the image, so it is not a complete inverse operation, and the essence of deconvolution is still a kind of convolution.

reference:

https://iksinc.online/2017/05/06/deconvolution-in-deep-learning/

Guess you like

Origin blog.csdn.net/qq_35326529/article/details/128099209