[Semantic Segmentation] Summary of Semantic Segmentation Upsampling Methods

In the semantic segmentation model, feature maps of different resolutions are generally obtained through Backbone, and then the feature maps are fused to generate prediction results. In this process, it is inevitable to upsample the low-resolution feature maps to improve their resolution. This paper The commonly used upsampling methods are counted, and the numpy implementation code of part of the upsampling algorithm is given, and the correctness of the code is checked by comparison with opencv. Part of the code gives an example of using pytorch.


content

1. Interpolation

1. Nearest neighbor interpolation

2. Bilinear interpolation

3. Other interpolation methods

二、PixelShuffle

3. Unpooling

Fourth, transposed convolution (deconvolution)

5. Appendix


1. Interpolation

Interpolation uses the relationship between pixels to calculate the inserted pixel value. The simplest and most commonly used are nearest neighbor interpolation and bilinear interpolation (the code for numpy implementation is provided), and there are other interpolation methods. This article will not introduce too much.

1. Nearest neighbor interpolation

Nearest neighbor interpolation is the simplest interpolation method. Select the point closest to the target point as the new insertion point, as shown in the following example:

 Numpy implementation and opencv comparison:

import cv2
from math import floor
import numpy as np


def interpolate_nearest(image, size):
    new_img = np.zeros(shape=size[::-1] + (image.shape[-1], )).astype('uint8')
    scale_h = image.shape[0] / size[1]
    scale_w = image.shape[1] / size[0]
    for i in range(size[1]):
        for j in range(size[0]):
            new_img[i, j] = image[int(floor(i * scale_h)), int(floor(j * scale_w))]
    return new_img


image = cv2.imread('512.png')
size = (256, 256)   # w, h
my_resized_image = interpolate_nearest(image, size)
cv_resized_image = cv2.resize(image, size, image, 0, 0, cv2.INTER_NEAREST)
assert np.allclose(my_resized_image, cv_resized_image), "Image not equal between your implemented and opencv."
cv2.imshow("opencv", cv_resized_image)
cv2.imshow('my_op', my_resized_image)
cv2.waitKey(0)

Effect:

2. Bilinear interpolation

Bilinear interpolation calculates the current pixel value according to the distance between the interpolation position and the surrounding pixels. As shown in the figure below, P is the point to be interpolated, Q11, Q12, Q21, and Q22 are the coordinates of the original pixel point, and f(Q22) represents the pixel value at the position of Q22. (Many models in deep learning use this)

In order to calculate the P pixel value, first horizontally interpolate, calculate the pixel value of R1, R2:

 Then vertically interpolate, and obtain the pixel value of P according to the R1 and R2 pixel values ​​obtained in the previous step:

 numpy implementation:

import cv2
from math import floor
import numpy as np


def interpolate_linear(image, size):
    h, w = image.shape[0:2]
    w_new, h_new = size
    h_scale = h / h_new
    w_scale = w / w_new

    h_index = np.linspace(0, h_new - 1, h_new)
    w_index = np.linspace(0, w_new - 1, w_new)
    wv, hv = np.meshgrid(w_index, h_index)
    hv = (hv + 0.5) * h_scale - 0.5
    wv = (wv + 0.5) * w_scale - 0.5
    # hv = hv * h_scale
    # wv = wv * w_scale
    hv[hv < 0] = 0
    wv[wv < 0] = 0

    h_down = hv.astype('int')
    w_down = wv.astype('int')
    h_up = h_down + 1
    w_up = w_down + 1
    h_up[h_up > (h - 1)] = h - 1
    w_up[w_up > (w - 1)] = w - 1

    pos_00 = image[h_down, w_down].astype('int')  # 左上
    pos_01 = image[h_up, w_down].astype('int')  # 左下
    pos_11 = image[h_up, w_up].astype('int')  # 右下
    pos_10 = image[h_down, w_up].astype('int')  # 右上

    m, n = np.modf(hv)[0], np.modf(wv)[0]
    m = np.expand_dims(m, axis=-1)
    n = np.expand_dims(n, axis=-1)
    a = pos_10 - pos_00
    b = pos_01 - pos_00
    c = pos_11 + pos_00 - pos_10 - pos_01
    image = np.round(a * n + b * m + c * n * m + pos_00).astype('uint8')
    return image


image = cv2.imread('512.png')
size = (256, 256)   # w, h
my_resized_image = interpolate_linear(image, size)
cv_resized_image = cv2.resize(image, size, image, 0, 0, cv2.INTER_LINEAR)
print(np.mean(np.abs(my_resized_image.astype('int') - cv_resized_image.astype('int')))) # 线性插值四舍五入数值计算像素值可能差1
assert np.allclose(my_resized_image, cv_resized_image, atol=1), "Image not equal between your implemented and opencv."
cv2.imshow("opencv", cv_resized_image)
cv2.imshow('my_op', my_resized_image)
cv2.waitKey(0)

Effect:

3. Other interpolation methods

There are many interpolation methods, not listed here, you can refer to the opencv documentation .

二、PixelShuffle

For a feature map with dimension [N, C, H, W], it needs to be upsampled by R times to obtain a feature map with dimension [N, C/(R^2), H*R, W*R] . It is quite simple to implement, just need to reshape, the code is as follows (aligned with torch.nn.PixelShuffle):

import torch
import numpy as np


def pixel_shuffle_np(x, up_factor):
    n, c, h, w = x.shape
    new_shape = (n, c // (up_factor * up_factor), up_factor, up_factor, h, w)
    npresult = np.reshape(x, new_shape)
    npresult = npresult.transpose(0, 1, 4, 2, 5, 3)
    oshape = [n, c // (up_factor * up_factor), h * up_factor, w * up_factor]
    npreslut = np.reshape(npresult, oshape)
    return npreslut


np.random.seed(10001)
image = np.random.rand(2, 16, 224, 224)
scale = 4
np_image = pixel_shuffle_np(image, scale)
torch_pixel_shuffle = torch.nn.PixelShuffle(scale)
torch_image = torch_pixel_shuffle(torch.from_numpy(image))
assert np.allclose(np_image, torch_image.numpy()), "Implemented PixelShuffle is not the same with torch.nn.PixelShuffle."

3. Unpooling

The de-pooling process is shown in the figure below. When the input feature map is pooled, the index of the maximum value in the original feature map is saved. When de-pooling, the feature value is put into the corresponding index, and other positions are filled with 0.

 torch code:

import torch
import numpy as np


inputs = np.array([1, 2, 6, 3, 3, 5, 2, 1, 1, 2, 2, 1, 7, 3, 4, 8], dtype='float').reshape([1, 1, 4, 4])
inputs = torch.from_numpy(inputs)
pool = torch.nn.MaxPool2d(2, stride=2, return_indices=True)
unpool = torch.nn.MaxUnpool2d(2, stride=2)
output, indices = pool(inputs)
output = unpool(output, indices)
print(output)

result:

Fourth, transposed convolution (deconvolution)

The methods described above are all methods without parameters. Is there a method that can learn different parameters for different tasks? Obviously there is, that is, transposed convolution.

Start with a simple example, as shown in the figure below, the input is 2x2, the kernel size is 2x2, each input number is multiplied by the kernel, and then accumulated to get a 3x3 output, you can see that the feature map has become larger (3x3 input After 2x2 convolution, the output of 2x2 is obtained, and the transposed convolution is the inverse operation of convolution).

 The above transposed convolution turns the 2x2 input feature map into a 3x3 output feature map. Can the output feature map become larger? Obviously, see the figure below. After the step size is introduced (the step size in the figure below can be understood as the step size on the output feature map), the input feature map is 2x2, the kernel is 2x2, and the output feature map is 4x4.

 In the same way, parameters such as padding and dilation can be introduced. Assuming that the input dimension is [N, C_{in}, H_{in}, W_{in}], the output dimension is [N, C_{out}, H_{out}, W_{out}], and the dimension calculation formula of the transposed convolution is:

H_{out}=(H_{in}-1)*stride[0]-2*padding[0]+dilation[0]*(kernel\_size[0]-1)+output\_padding[0]+1

W_{out}=(W_{in}-1)*stride[1]-2*padding[1]+dilation[1]*(kernel\_size[1]-1)+output\_padding[1]+1

Code practice (input feature map: [1, 3, 50, 50], output feature map: [1, 3, 98, 98]):

import torch


x = torch.rand(1, 3, 50, 50)
transpose_conv = torch.nn.ConvTranspose2d(3, 3, kernel_size=3, stride=2, padding=2, output_padding=1)
y = transpose_conv(x)
print(x.shape, y.shape)
# 输出:
# torch.Size([1, 3, 50, 50]) torch.Size([1, 3, 98, 98])

5. Appendix

All sample code

Guess you like

Origin blog.csdn.net/qq_40035462/article/details/123652157