(Torch)SpatialConv and SpatialFullConv

Torch  nnmodule contains common layer convolution (convolutional layers), wherein the article introduces two names are similar, but the role of the different convolution:

  • Spatial Modules: 
    • SpatialConvolution: a 2D convolution (2D Convolution) on the input image
    • SpatialFullConvolution: a 2D full convolution (2D full convolution) on the input image

Term is defined by one of two convolution operations:

1.SpatialConvolution

Module = Nnkshptialconvolution ( Ninputflne, Noutputflne, KW, state, [ DW], [ n], [ PadW], [ Pda])

For multi-channel image for 2D convolution operation, the object (referred Tensor) is input Tensor 3D ( nInputPlane x height x width), which nInputPlaneis often said that the number of channels.

The meaning of each parameter of the function is:

  • nInputPlane: Number of channels of the input image
  • nOutputPlane: Channel layer convolution output data (in Caffe num_output)
  • kW: Convolution kernel window width
  • kH: Convolution window length
  • dW: The moving step in the wide side direction from the convolution windows, the default value is 1
  • dH: Moving step in the direction of the longitudinal distance along the convolution window, the default value is 1
  • padWpadH: Enter zeros, the default value is 0, setting is good (kW-1)/2, can guarantee uniform plane size of the input feature plane size of the map convolution.

Tensor for a 3D ( nInputPlane x height x width), the convolution Feature Map layer plane size is outputted nOutputPlane x oheight x owidth , wherein:

owidth  = floor((width  + 2*padW - kW) / dW + 1)
oheight = floor((height + 2*padH - kH) / dH + 1)

To be checked by the convolutional weighting self.weight(a nOutputPlane x nInputPlane x kH x kWsize Tensor) and offset  self.bias(a nOutputPlanesize tensor), the output value may be calculated convolution layer:

output[i][j][k] = bias[k]
 + sum_l sum_{s=1}^kW sum_{t=1}^kH weight[s][t][l][k]  * input[dW*(i-1)+s)][dH*(j-1)+t][l]

2.SpatialFullConvolution

Module = NnkSpatialFullConvolution ( Ninputflne, Noutputflne, KW, state, [ DW], [ n], [ PadW], [ Pda], [ AdjW], [ Adjः])

At first sight with SpatialConvolutionbasically the same, just more than two parameters only. In the other frame, this operation is equivalent to: "In-network Upsampling", "Fractionally-strided convolution", "Backwards Convolution," "Deconvolution", or "Upconvolution".

Simply put, most of the parameters is a deconvolution or the sampling operation, function and SpatialConvolutionthe same:

  • nOutputPlane: Channel layer convolution output data (in Caffe num_output)
  • kW: Convolution kernel window width
  • kH: Convolution window length
  • dW: The moving step in the wide side direction from the convolution windows, the default value is 1
  • dH: Moving step in the direction of the longitudinal distance along the convolution window, the default value is 1
  • padWpadH: Enter zeros, the default value is 0, setting is good (kW-1)/2, you can ensure a consistent size of plane feature map convolution with the input size Plane
  • adjW: Plus a certain additional width or height to the output image, the default value is 0, but not more than dW-1 / dH-1.

The SpatialConvolutiondifference is that for the same 3D Tensor ( nInputPlane x height x width) input, the output ( nOutputPlane x oheight x owidthnot the same) is calculated:

owidth  = (width  - 1) * dW - 2*padW + kW + adjW oheight = (height - 1) * dH - 2*padH + kH + adjH

More details about Full Convolution, read FCN papers:  Fully Convolutional Networks for Semantic Segmentation .


References

  1. https://github.com/torch/nn/blob/master/doc/convolution.md

Guess you like

Origin www.cnblogs.com/ChenKe-cheng/p/11203910.html