【torch.nn.Fold】和【torch.nn.Unfold】

torch.nn.Unfold

intuitive understanding

The function of torhc.nn.Unfold: From a batch sample, extract the sliding local area block patch(that is, the sliding window corresponding to the extracted kernel filter in the convolution operation) and unfold it in order, and the obtained feature number is 通道数*卷积核的宽*卷积核的高, The picture below Lis the total after sliding patch的个数.
insert image description here
for example:

import torch
input1=torch.randn(1,3,4,6)
print(input1)
unfold1=torch.nn.Unfold(kernel_size=(2,3),stride=(2,3))
patches1=unfold1(input1)
print(patches1.shape)
print(patches1)

4The red frame, blue frame, yellow frame, and green frame in the figure below are the patches obtained when the 2x3 window slides according to the stride of 2x3 . The total number of features for each patch is 2*3*3=18( the height of the sliding window * the width of the sliding window * the number of channels, the height of the sliding window * the width of the sliding window * the number of channelsthe height of the sliding windowThe width of the sliding windowThe output
obtainedpatches1is to expand the features of each patch in order, and the output size is(1,18,4)

insert image description here

official document

CLASS
torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1)
  • Function: Extract sliding local blocks from a batch of input tensors.

    Suppose the input tensor size of a batch is ( N , C , ∗ ) (N,C,*)(N,C,) , whereNNN represents the dimension of the batch,CCC represents the channel dimension,∗ * represents any spatial dimension. This operation flattens each kernel_size sliding block in the input space dimension into a column, and the output size is( N , C × ∏ ( kernel _ size ) , L ) \left(N, C \times \prod( kernel\_size ), L\right)(N,C×(kernel_size),L ) , divideC × ∏ ( kernel _ size ) C \times \prod( kernel\_size) .C×( k er n e l _ s i ze ) indicates the number of all values ​​contained in each block, a block is the product of the area of ​​kernel_size and the number of channels,LLL is the number of such blocks.

    L = ∏ d ⌊  spatial_size  [ d ] + 2 × padding ⁡ [ d ] − dilation ⁡ [ d ] × ( kernel ⁡ _ size  [ d ] − 1 ) − 1 stride ⁡ [ d ] + 1 ] ,  L=\prod_d\left\lfloor\frac{\text { spatial\_size }[d]+2 \times \operatorname{padding}[d]-\operatorname{dilation}[d] \times\left(\operatorname{kernel} \_ \text {size }[d]-1\right)-1}{\operatorname{stride}[d]}+1\right] \text {, } L=dstride[d] spatial_size [d]+2×padding[d]dilation[d]×(kernel_size [d]1)1+1]

    其中 s p a t i a l _ s i z e spatial\_size s p a t ia l _ s i ze is the spatial dimension of the input (corresponding to the above *),ddd is all spatial dimensions.

    Thus, an indexed output of the last dimension (column dimension) gives all values ​​within a certain block.

    The padding, stride and dilation parameters specify how to retrieve the sliders.

    Stride controls the stride of the slider; Padding controls the amount of implicit zero-padding on either side of the number of padding points in each dimension before reshaping.

    dilation controls the spacing between kenel points; also known as the à trous algorithm.

  • parameter

    • kernel_size(int or tuple): the size of the slider
    • dilation(int or tuple,optional): A parameter that controls the stride of elements in the neighborhood. Default: 1
    • padding(int or tuple, optional): Adds implicit zero padding on both sides of the input. Default: 0
    • stride(int or tuple, optional): The step size of the slider in the input space dimension. Default: 1

    If kernel_size, dilation, padding, or stride are ints or tuples of length 1, their values ​​will be replicated across all spatial dimensions.

  • shape:

    • Input: ( N , C , ∗ ) (N,C,*)(N,C,)
    • Definition: ( N , C × ∏ ( kernel _ size ) , L ) \left(N , C \times \prod( kernel\_size ), L\right)(N,C×(kernel_size),L)
  • example

unfold = nn.Unfold(kernel_size=(2, 3))
input = torch.randn(2, 5, 3, 4)
output = unfold(input)
# each patch contains 30 values (2x3=6 vectors, each of 5 channels)
# 4 blocks (2x3 kernels) in total in the 3x4 input
output.size()

# Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
inp = torch.randn(1, 3, 10, 12)
w = torch.randn(2, 3, 4, 5)
inp_unf = torch.nn.functional.unfold(inp, (4, 5))
out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
# or equivalently (and avoiding a copy),
# out = out_unf.view(1, 2, 7, 8)
(torch.nn.functional.conv2d(inp, w) - out).abs().max()

toch.nn.Fold

intuitive understanding

toch.nn.Fold is the inverse operation of torch.nn.Unfold, which restores the extracted sliding local area block to the tensor form of the batch.
insert image description here
For example: we perform patchesthe Flod operation through the convolution kernel and stride of the same size output above, and the obtained input_restoreand input1are the same, indicating that Fold and UnFold are inverse operations.

fold1=torch.nn.Fold(output_size=(4,6),kernel_size=(2,3),stride=(2,3))
input_restore=fold1(patches1)
print(input_restore.shape)
print(input_restore==input1)
print(input_restore)

insert image description here

official document

CLASS
torch.nn.Fold(output_size, kernel_size, dilation=1, padding=0, stride=1)
  • Function:

Contrary to Unfold, the extracted sliding local area blocks are restored to the tensor form of the batch.

  • parameter
    • output_size(int or tuple): the shape of the output spatial dimension
    • kernel_size(int or tuple): the size of the slider
    • dilation(int or tuple,optional): A parameter that controls the stride of elements in the neighborhood. Default: 1
    • padding(int or tuple, optional): Adds implicit zero padding on both sides of the input. Default: 0
    • stride(int or tuple, optional): The step size of the slider in the input space dimension. Default: 1
  • shape
    • 输入: ( N , C × ∏ (  kernel_size  ) , L ) \left(N, C \times \prod(\text { kernel\_size }), L\right) (N,C×( kernel_size ),L) 或者 ( C × ∏ (  kernel_size  ) , L ) \left( C \times \prod(\text { kernel\_size }), L\right) (C×( kernel_size ),L)
    • 输出: ( N , C ,  output_size  [ 0 ] ,  output_size  [ 1 ] , … ) (N, C, \text { output\_size }[0], \text { output\_size }[1], \ldots) (N,C, output_size [0], output_size [1],) ( N , C ,  output_size  [ 0 ] ,  output_size  [ 1 ] , … ) (N, C, \text { output\_size }[0], \text { output\_size }[1], \ldots) (N,C, output_size [0], output_size [1],)
  • example
>>> fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2))
>>> input = torch.randn(1, 3 * 2 * 2, 12)
>>> output = fold(input)
>>> output.size()
torch.Size([1, 3, 4, 5])

Guess you like

Origin blog.csdn.net/zyw2002/article/details/132177697