Article directory
torch.nn.Unfold
intuitive understanding
The function of torhc.nn.Unfold: From a batch sample, extract the sliding local area block patch
(that is, the sliding window corresponding to the extracted kernel filter in the convolution operation) and unfold it in order, and the obtained feature number is 通道数*卷积核的宽*卷积核的高
, The picture below L
is the total after sliding patch的个数
.
for example:
import torch
input1=torch.randn(1,3,4,6)
print(input1)
unfold1=torch.nn.Unfold(kernel_size=(2,3),stride=(2,3))
patches1=unfold1(input1)
print(patches1.shape)
print(patches1)
4
The red frame, blue frame, yellow frame, and green frame in the figure below are the patches obtained when the 2x3 window slides according to the stride of 2x3 . The total number of features for each patch is 2*3*3=18
( the height of the sliding window * the width of the sliding window * the number of channels, the height of the sliding window * the width of the sliding window * the number of channelsthe height of the sliding window∗The width of the sliding window∗The output
obtainedpatches1
is to expand the features of each patch in order, and the output size is(1,18,4)
official document
CLASS
torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1)
-
Function: Extract sliding local blocks from a batch of input tensors.
Suppose the input tensor size of a batch is ( N , C , ∗ ) (N,C,*)(N,C,∗ ) , whereNNN represents the dimension of the batch,CCC represents the channel dimension,∗ *∗ represents any spatial dimension. This operation flattens each kernel_size sliding block in the input space dimension into a column, and the output size is( N , C × ∏ ( kernel _ size ) , L ) \left(N, C \times \prod( kernel\_size ), L\right)(N,C×∏(kernel_size),L ) , divideC × ∏ ( kernel _ size ) C \times \prod( kernel\_size) .C×∏ ( k er n e l _ s i ze ) indicates the number of all values contained in each block, a block is the product of the area of kernel_size and the number of channels,LLL is the number of such blocks.
L = ∏ d ⌊ spatial_size [ d ] + 2 × padding [ d ] − dilation [ d ] × ( kernel _ size [ d ] − 1 ) − 1 stride [ d ] + 1 ] , L=\prod_d\left\lfloor\frac{\text { spatial\_size }[d]+2 \times \operatorname{padding}[d]-\operatorname{dilation}[d] \times\left(\operatorname{kernel} \_ \text {size }[d]-1\right)-1}{\operatorname{stride}[d]}+1\right] \text {, } L=d∏⌊stride[d] spatial_size [d]+2×padding[d]−dilation[d]×(kernel_size [d]−1)−1+1],
其中 s p a t i a l _ s i z e spatial\_size s p a t ia l _ s i ze is the spatial dimension of the input (corresponding to the above *),ddd is all spatial dimensions.
Thus, an indexed output of the last dimension (column dimension) gives all values within a certain block.
The padding, stride and dilation parameters specify how to retrieve the sliders.
Stride controls the stride of the slider; Padding controls the amount of implicit zero-padding on either side of the number of padding points in each dimension before reshaping.
dilation controls the spacing between kenel points; also known as the à trous algorithm.
-
parameter
kernel_size(int or tuple)
: the size of the sliderdilation(int or tuple,optional)
: A parameter that controls the stride of elements in the neighborhood. Default: 1padding(int or tuple, optional)
: Adds implicit zero padding on both sides of the input. Default: 0stride(int or tuple, optional)
: The step size of the slider in the input space dimension. Default: 1
If kernel_size, dilation, padding, or stride are ints or tuples of length 1, their values will be replicated across all spatial dimensions.
-
shape:
- Input: ( N , C , ∗ ) (N,C,*)(N,C,∗)
- Definition: ( N , C × ∏ ( kernel _ size ) , L ) \left(N , C \times \prod( kernel\_size ), L\right)(N,C×∏(kernel_size),L)
-
example
unfold = nn.Unfold(kernel_size=(2, 3))
input = torch.randn(2, 5, 3, 4)
output = unfold(input)
# each patch contains 30 values (2x3=6 vectors, each of 5 channels)
# 4 blocks (2x3 kernels) in total in the 3x4 input
output.size()
# Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
inp = torch.randn(1, 3, 10, 12)
w = torch.randn(2, 3, 4, 5)
inp_unf = torch.nn.functional.unfold(inp, (4, 5))
out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
# or equivalently (and avoiding a copy),
# out = out_unf.view(1, 2, 7, 8)
(torch.nn.functional.conv2d(inp, w) - out).abs().max()
toch.nn.Fold
intuitive understanding
toch.nn.Fold is the inverse operation of torch.nn.Unfold, which restores the extracted sliding local area block to the tensor form of the batch.
For example: we perform patches
the Flod operation through the convolution kernel and stride of the same size output above, and the obtained input_restore
and input1
are the same, indicating that Fold and UnFold are inverse operations.
fold1=torch.nn.Fold(output_size=(4,6),kernel_size=(2,3),stride=(2,3))
input_restore=fold1(patches1)
print(input_restore.shape)
print(input_restore==input1)
print(input_restore)
official document
CLASS
torch.nn.Fold(output_size, kernel_size, dilation=1, padding=0, stride=1)
- Function:
Contrary to Unfold, the extracted sliding local area blocks are restored to the tensor form of the batch.
- parameter
output_size(int or tuple)
: the shape of the output spatial dimensionkernel_size(int or tuple)
: the size of the sliderdilation(int or tuple,optional)
: A parameter that controls the stride of elements in the neighborhood. Default: 1padding(int or tuple, optional)
: Adds implicit zero padding on both sides of the input. Default: 0stride(int or tuple, optional)
: The step size of the slider in the input space dimension. Default: 1
- shape
- 输入: ( N , C × ∏ ( kernel_size ) , L ) \left(N, C \times \prod(\text { kernel\_size }), L\right) (N,C×∏( kernel_size ),L) 或者 ( C × ∏ ( kernel_size ) , L ) \left( C \times \prod(\text { kernel\_size }), L\right) (C×∏( kernel_size ),L)
- 输出: ( N , C , output_size [ 0 ] , output_size [ 1 ] , … ) (N, C, \text { output\_size }[0], \text { output\_size }[1], \ldots) (N,C, output_size [0], output_size [1],…)或 ( N , C , output_size [ 0 ] , output_size [ 1 ] , … ) (N, C, \text { output\_size }[0], \text { output\_size }[1], \ldots) (N,C, output_size [0], output_size [1],…)
- example
>>> fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2))
>>> input = torch.randn(1, 3 * 2 * 2, 12)
>>> output = fold(input)
>>> output.size()
torch.Size([1, 3, 4, 5])