Torch nn
module contains common layer convolution (convolutional layers), wherein the article introduces two names are similar, but the role of the different convolution:
- Spatial Modules:
- SpatialConvolution: a 2D convolution (2D Convolution) on the input image
- SpatialFullConvolution: a 2D full convolution (2D full convolution) on the input image
Term is defined by one of two convolution operations:
1.SpatialConvolution
Module = Nnkshptialconvolution ( Ninputflne, Noutputflne, KW, state, [ DW], [ n], [ PadW], [ Pda])
For multi-channel image for 2D convolution operation, the object (referred Tensor) is input Tensor 3D ( nInputPlane x height x width
), which nInputPlane
is often said that the number of channels.
The meaning of each parameter of the function is:
nInputPlane
: Number of channels of the input imagenOutputPlane
: Channel layer convolution output data (in Caffenum_output
)kW
: Convolution kernel window widthkH
: Convolution window lengthdW
: The moving step in the wide side direction from the convolution windows, the default value is 1dH
: Moving step in the direction of the longitudinal distance along the convolution window, the default value is 1padW
,padH
: Enter zeros, the default value is 0, setting is good(kW-1)/2
, can guarantee uniform plane size of the input feature plane size of the map convolution.
Tensor for a 3D ( nInputPlane x height x width
), the convolution Feature Map layer plane size is outputted nOutputPlane x oheight x owidth
, wherein:
owidth = floor((width + 2*padW - kW) / dW + 1)
oheight = floor((height + 2*padH - kH) / dH + 1)
To be checked by the convolutional weighting self.weight
(a nOutputPlane x nInputPlane x kH x kW
size Tensor) and offset self.bias
(a nOutputPlane
size tensor), the output value may be calculated convolution layer:
output[i][j][k] = bias[k]
+ sum_l sum_{s=1}^kW sum_{t=1}^kH weight[s][t][l][k] * input[dW*(i-1)+s)][dH*(j-1)+t][l]
2.SpatialFullConvolution
Module = NnkSpatialFullConvolution ( Ninputflne, Noutputflne, KW, state, [ DW], [ n], [ PadW], [ Pda], [ AdjW], [ Adjः])
At first sight with SpatialConvolution
basically the same, just more than two parameters only. In the other frame, this operation is equivalent to: "In-network Upsampling", "Fractionally-strided convolution", "Backwards Convolution," "Deconvolution", or "Upconvolution".
Simply put, most of the parameters is a deconvolution or the sampling operation, function and SpatialConvolution
the same:
nOutputPlane
: Channel layer convolution output data (in Caffenum_output
)kW
: Convolution kernel window widthkH
: Convolution window lengthdW
: The moving step in the wide side direction from the convolution windows, the default value is 1dH
: Moving step in the direction of the longitudinal distance along the convolution window, the default value is 1padW
,padH
: Enter zeros, the default value is 0, setting is good(kW-1)/2
, you can ensure a consistent size of plane feature map convolution with the input size PlaneadjW
: Plus a certain additional width or height to the output image, the default value is 0, but not more than dW-1 / dH-1.
The SpatialConvolution
difference is that for the same 3D Tensor ( nInputPlane x height x width
) input, the output ( nOutputPlane x oheight x owidth
not the same) is calculated:
owidth = (width - 1) * dW - 2*padW + kW + adjW oheight = (height - 1) * dH - 2*padH + kH + adjH
More details about Full Convolution, read FCN papers: Fully Convolutional Networks for Semantic Segmentation .
References