The concept and implementation of deep learning upsampling and downsampling

#pic_center = 400x
series of articles:



reference blog

[Deep Learning] Upsampling, Downsampling, Convolution

torch.nn.functional.interpolate function

concept

upsampling

Simply put, the picture is enlarged by inserting data in the pixel key

1. Interpolation, generally use bilinear interpolation, because the effect is the best, although the calculation is more complicated than other interpolation methods, but compared to convolution calculation, it can be said that it is not worth mentioning. Other interpolation methods include nearest neighbor interpolation, triple linear interpolation, etc.;

2. Transpose convolution or deconvolution (Transpose Conv), by filling the input feature map interval with 0, and then performing standard convolution calculations, the size of the output feature map can be made larger than the input; compared to the above Pooling, the "upsampling" of images using deconvolution can be learned (convolution operations are used, and its parameters are learnable).

downsampling

Simply put, it is to shrink the image.
There are two main purposes: 1. Make the image conform to the size of the display area; 2. Generate a thumbnail of the corresponding image;

1. Implemented with a convolutional layer with a stride of 2: the image reduction caused by the convolution process is to extract features. The downsampling process is a process of information loss, and the pooling layer is not learnable. Using a learnable convolutional layer with a stride of 2 instead of pooling can achieve better results, but of course it also increases a certain amount of calculation.

2. Implemented with a pooling layer with a stride of 2: Pooling downsampling is to reduce the dimensionality of features. Such as Max-pooling and Average-pooling, currently Max-pooling is usually used because it is simple to calculate and can better preserve texture features.

accomplish

The complete code is in DDIM/models/diffusion.py

class Upsample(nn.Module):
    def __init__(self, in_channels, with_conv):
        super().__init__()
        self.with_conv = with_conv
        if self.with_conv:
            self.conv = torch.nn.Conv2d(in_channels,
                                        in_channels,
                                        kernel_size=3,
                                        stride=1,
                                        padding=1)

    def forward(self, x):
        x = torch.nn.functional.interpolate(
            x, scale_factor=2.0, mode="nearest")
        if self.with_conv:
            x = self.conv(x)
        return x


class Downsample(nn.Module):
    def __init__(self, in_channels, with_conv):
        super().__init__()
        self.with_conv = with_conv
        if self.with_conv:
            # no asymmetric padding in torch conv, must do it ourselves
            self.conv = torch.nn.Conv2d(in_channels,
                                        in_channels,
                                        kernel_size=3,
                                        stride=2,
                                        padding=0)

    def forward(self, x):
        if self.with_conv:
            pad = (0, 1, 0, 1)
            x = torch.nn.functional.pad(x, pad, mode="constant", value=0)
            x = self.conv(x)
        else:
            x = torch.nn.functional.avg_pool2d(x, kernel_size=2, stride=2)
        return x

upsampling

Interpolate means interpolation

 x = torch.nn.functional.interpolate(
            x, scale_factor=2.0, mode="nearest")

def interpolate(

​ input: Any,
​ size: Any | None = …,
​ scale_factor: Any | None = …,
​ mode: str = …,
​ align_corners: Any | None = …,
​ recompute_scale_factor: Any | None = …,
​ antialias: bool = …) -> None

parameter:

  • input ( Tensor ) – input tensor.

  • size (int or Tuple*[int] or* Tuple*[int,* int] or Tuple*[int,* int, int]) –输出大小。

  • scale_factor ( float or Tuple *[ float ]*) – Specifies how many times the output is the input. If the input is a tuple, it should also be specified as a tuple type.

    • Note: size and scale_factor can specify one
  • mode (str) – 可使用的上采样算法,有'nearest', 'linear', 'bilinear', 'bicubic' , 'trilinear'和'area'. 默认使用'nearest'。

  • align_corners ( bool , optional ) – 几何上,我们认为输入和输出的像素是正方形,而不是点。If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values ​​at the corner pixels. If set to False, the input and output tensors are aligned by the corners of their corner pixels, and the interpolation is padded with edge values ​​of out-of-bounds values; , making this operation independent 当scale_factor保持不变时of input size

downsampling

# 池化方式
x = torch.nn.functional.avg_pool2d(x, kernel_size=2, stride=2)
#卷积形式
x = np.random.randint(1,10, [1,5,5])
x = torch.FloatTensor(x)
print(x)
in_channels = 1
conv = torch.nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=2, padding=0)
print(conv)
x = conv(x)
print(x.shape)

Guess you like

Origin blog.csdn.net/weixin_42382758/article/details/130420658