Common upsampling methods in super-resolution tasks

1. Linear interpolation method

1.1 Nearest Neighbor Interpolation

insert image description here
The above picture is a schematic diagram of one-dimensional nearest neighbor interpolation. The points xi-1, xi, xi+1 on the coordinate axis are divided into two and a half pairs of equal intervals (divided by the red dotted line), so that each coordinate point on the non-boundary has A neighborhood of equal width, and according to the value of each coordinate point, a function constraint similar to a piecewise function is formed, so that the value of each interpolated coordinate point is equal to the value of the original coordinate point in the neighborhood. For example, if the interpolation point x is located in the neighborhood of the coordinate point xi, then its value f(x) is equal to f(xi).

insert image description here
The above picture is a quantitative top view diagram of two-dimensional nearest neighbor interpolation. (x0, y0), (x0, y1), (x1, y0), (x1, y1) are coordinate points on the original image, and the gray values ​​are respectively Corresponding to Q11, Q12, Q21, Q22. The interpolation point (x, y) whose gray value is unknown, according to the constraints of the nearest neighbor interpolation method, is closest to the coordinate point (x0, y0) (that is, it is located in the neighborhood of (x0, y0)), so the interpolation point The gray value of (x, y) P = Q11.

1.2 Linear Interpolation

insert image description here
The above figure is a qualitative schematic diagram of one-dimensional linear interpolation. The values ​​of each point xi-1, xi, xi+1 ... on the coordinate axis are "directly connected in pairs" as line segments, thus forming a continuous constraint function. For an interpolation coordinate point such as x, its value should be f(x) according to the constraint function. Because the constraint function curve between every two coordinate points is a linear line segment, it is "linear" for the interpolation result, so this method is called linear interpolation.

insert image description here
The figure above is a quantitative schematic diagram of one-dimensional linear interpolation, where x0 and x1 are the original coordinate points, and the gray values ​​correspond to y0 and y1 respectively. For the interpolation point x whose gray value is unknown, according to the constraints of the linear interpolation method, the gray value y of the linear function formed by (x0, y0) and (x1, y1) is y = y 0 + ( x
− x 0 ) y 1 − y 0 x 1 − x 0 = y 0 + ( x − x 0 ) y 1 − ( x − x 0 ) y 0 x 1 − x 0 y=y_0+\left(x-x_0\right ) \frac{y_1-y_0}{x_1-x_0}=y_0+\frac{\left(x-x_0\right) y_1-\left(x-x_0\right) y_0}{x_1-x_0}y=y0+(xx0)x1x0y1y0=y0+x1x0(xx0)y1(xx0)y0

1.3 Bilinear Interpolation Algorithm (Bilinear Interpolation)

insert image description here
It is easy to extend from one-dimensional linear interpolation to bilinear interpolation of two-dimensional images. It needs three first-order linear interpolations each time to obtain the final result. The above figure shows a qualitative squint diagram of the process. Among them, (x0, y0), (x0, y1), (x1, y0), (x1, y1) are pixel coordinate points on the original image, and the gray values ​​correspond to f(x0, y0), f( x0, y1), f(x1, y0), f(x1, y1). For the interpolation point (x, y) whose gray value is unknown, according to the constraints of the bilinear interpolation method, it can be obtained by one-dimensional linear interpolation of the pixel coordinate points (x0, y0) and (x0, y1) on the y axis. f(x0, y), from the pixel coordinate points (x1, y0) and (x1, y1) do one-dimensional linear interpolation in the y-axis to get f(x1, y), and then from (x0, y) and (x1 , y) Perform one-dimensional linear interpolation on the x-axis to obtain the gray value f(x, y) of the interpolation point (x, y). Of course, if one-dimensional linear interpolation is first performed on the x-axis and then on the y-axis, the results obtained are exactly the same, only the difference in order
insert image description here
. There are changes but not affected), let's change the order. Firstly, f(x, y0) is obtained by one-dimensional linear interpolation from the pixel coordinate points (x0, y0) and (x1, y0) in the x axis, and f(x, y0) is obtained from the pixel coordinate points (x0, y1) and (x1, y1) in x Perform one-dimensional linear interpolation in the axial direction to obtain f(x, y1):
f ( x , y 0 ) = x 1 − x x 1 − x 0 f ( x 0 , y 0 ) + x − x 0 x 1 − x 0 f ( x 1 , y 0 ) f ( x , y 1 ) = x 1 − x x 1 − x 0 f ( x 0 , y 1 ) + x − x 0 x 1 − x 0 f ( x 1 , y 1 ) \begin{aligned} &f\left(x, y_0\right)=\frac{x_1-x}{x_1-x_0} f\left(x_0, y_0\right)+\frac{x-x_0}{x_1-x_0} f\left(x_1, y_0\right)\\ &f\left(x, y_1\right)=\frac{x_1-x}{x_1-x_0} f\left(x_0, y_1\right)+\frac{x-x_0}{x_1-x_0} f\left(x_1, y_1\right) \end{aligned} f(x,y0)=x1x0x1xf(x0,y0)+x1x0xx0f(x1,y0)f(x,y1)=x1x0x1xf(x0,y1)+x1x0xx0f(x1,y1)
Then perform one-dimensional linear interpolation on the y-axis from (x, y0) and (x, y1) to obtain the gray value f(x, y) of the interpolation point (x, y):

f ( x , y ) = y 1 − y y 1 − y 0 f ( x , y 0 ) + y − y 0 y 1 − y 0 f ( x , y 1 ) f(x, y)=\frac{y_1-y}{y_1-y_0} f\left(x, y_0\right)+\frac{y-y_0}{y_1-y_0} f\left(x, y_1\right) f(x,y)=y1y0y1yf(x,y0)+y1y0yy0f(x,y1)

合并上式,得到最终的双线性插值结果:
f ( x , y ) = ( y 1 − y ) ( x 1 − x ) ( y 1 − y 0 ) ( x 1 − x 0 ) f ( x 0 , y 0 ) + ( y 1 − y ) ( x − x 0 ) ( y 1 − y 0 ) ( x 1 − x 0 ) f ( x 1 , y 0 ) + ( y − y 0 ) ( x 1 − x ) ( y 1 − y 0 ) ( x 1 − x 0 ) f ( x 0 , y 1 ) + ( y − y 0 ) ( x − x 0 ) ( y 1 − y 0 ) ( x 1 − x 0 ) f(x, y)=\frac{\left(y_1-y\right)\left(x_1-x\right)}{\left(y_1-y_0\right)\left(x_1-x_0\right)} f\left(x_0, y_0\right)+\frac{\left(y_1-y\right)\left(x-x_0\right)}{\left(y_1-y_0\right)\left(x_1-x_0\right)} f\left(x_1, y_0\right)+\frac{\left(y-y_0\right)\left(x_1-x\right)}{\left(y_1-y_0\right)\left(x_1-x_0\right)} f\left(x_0, y_1\right)+\frac{\left(y-y_0\right)\left(x-x_0\right)}{\left(y_1-y_0\right)\left(x_1-x_0\right)} f(x,y)=(y1y0)(x1x0)(y1y)(x1x)f(x0,y0)+(y1y0)(x1x0)(y1y)(xx0)f(x1,y0)+(y1y0)(x1x0)(yy0)(x1x)f(x0,y1)+(y1y0)(x1x0)(yy0)(xx0)

1.4 Bicubic Interpolation

Also known as cubic convolution interpolation/bicubic interpolation, in numerical analysis, bicubic interpolation is the most commonly used interpolation method in two-dimensional space. In this method, the pixel gray value f(x, y) of the interpolation point (x, y) is obtained by the weighted average of the nearest sixteen sampling points in the rectangular grid, and the weight of each sampling point is determined by the point The distance to the point to be interpolated is determined, and this distance includes the distance in both horizontal and vertical directions. In contrast, bilinear interpolation is weighted by the surrounding four sample points.
insert image description here
The figure above is a schematic diagram of a bicubic interpolation top view of a two-dimensional image. Suppose the coordinates of the interpolation point to be calculated are (i+u, j+v), and the gray values ​​of the 16 pixel coordinate points (grid) around it are known, and the respective weights of the 16 points need to be calculated. Take the pixel coordinate point (i, j) as an example, because the distance between this point and the interpolation point (i+u, j+v) in the y-axis and x-axis directions is u and v respectively, so the weight is w (u) × w(v), where w( ) is the interpolation weight kernel (which can be understood as the defined weight function). In the same way, the respective weights of the remaining 15 pixel coordinate points can be obtained. Then, the gray value f(i+u, j+v) of the interpolation point (i+u, j+v) to be calculated will be obtained by the following calculation: f ( i + u , j + v ) = A × B
× C f(i+u, j+v)=A \times B \times Cf(i+u,j+v)=A×B×C
where each item is represented by a vector or matrix as:
A = [ w ( 1 + u ) w ( u ) w ( 1 − u ) w ( 2 − u ) ] B = [ f ( i − 1 , j − 1 ) f ( i − 1 , j + 0 ) f ( i − 1 , j + 1 ) f ( i − 1 , j + 2 ) f ( i + 0 , j − 1 ) f ( i + 0 , j + 0 ) f ( i + 0 , j + 1 ) f ( i + 0 , j + 2 ) f ( i + 1 , j − 1 ) f ( i + 1 , j + 0 ) f ( i + 1 , j + 1 ) f ( i + 1 , j + 2 ) f ( i + 2 , j − 1 ) f ( i + 2 , j + 0 ) f ( i + 2 , j + 1 ) f ( i + 2 , j + 2 ) ] C = [ w ( 1 + v ) w ( v ) w ( 1 − v ) w ( 2 − v ) ] T \begin{gathered} \mathrm{A}=\left[\begin{array}{llll} w(1+u) & w(u) & w(1-u) & w(2-u) \end{array}\right] \\ \mathrm{B}=\left[\begin{array}{llll} f(i-1, j-1) & f(i-1, j+0) & f(i-1, j+1) & f(i-1, j+2) \\ f(i+0, j-1) & f(i+0, j+0) & f(i+0, j+1) & f(i+0, j+2) \\ f(i+1, j-1) & f(i+1, j+0) & f(i+1, j+1) & f(i+1, j+2) \\ f(i+2, j-1) & f(i+2, j+0) & f(i+2, j+1) & f(i+2,j+2) \end{array}\right]\\\mathrm{C}=\left[\begin{array}{llll} w(1+v) & w(v) & w(1-v) & w(2-v) \end{array}\right]^T\end{gathered}A=[w(1+u)w(u)w(1u)w(2u)]B= f(i1,j1)f(i+0,j1)f(i+1,j1)f(i+2,j1)f(i1,j+0)f(i+0,j+0)f(i+1,j+0)f(i+2,j+0)f(i1,j+1)f(i+0,j+1)f(i+1,j+1)f(i+2,j+1)f(i1,j+2)f(i+0,j+2)f(i+1,j+2)f(i+2,j+2) C=[w(1+v)w(v)w(1v)w(2v)]T
插值权重核 w(·) 为BiCubic函数
w ( x ) = { 1 − 2 ∣ x ∣ 2 + ∣ x ∣ 3 , ∣ x ∣ < 1 4 − 8 ∣ x ∣ + 5 ∣ x ∣ 2 − ∣ x ∣ 3 , 1 ≤ ∣ x ∣ < 2 0 ∣ x ∣ ≥ 2 w(x)=\left\{\begin{array}{cc} 1-2|x|^2+|x|^3 & ,|x|<1 \\ 4-8|x|+5|x|^2-|x|^3, & 1 \leq|x|<2 \\ 0 & |x| \geq 2 \end{array}\right. w(x)= 12∣x2+x348∣x+5∣x2x3,0,x<11x<2x2
Its function image is as follows:
insert image description here

2. Deep Learning

2.1 Deconvolution/Transposed Convolution (Deconvolution/Transposed Convolution)

For details, see: Convolution Operations in Deep Learning
The following figure shows a 2x2 convolution kernel and a transposed convolution when stride=1.
insert image description here
The kernel tensor is multiplied element by element with the input tensor and placed in the corresponding place. That is to say, the first element is 0, that is, 0 is multiplied by the entire nuclear tensor and placed in the corresponding position. The second element is 1, which is multiplied by the kernel tensor and placed in the corresponding slide to the next position. and so on. Four graphs are obtained, and the final output is obtained by adding the four graphs. The example stride here is 1, so the sliding step is 1.
The summed up formula is:
Y [ i : i + h , j : j + w ] + = X [ i , j ] ∗ K \mathrm{Y}[\mathrm{i}: \mathrm{i}+\mathrm {h}, \mathrm{j}: \mathrm{j}+\mathrm{w}]+=\mathrm{X}[\mathrm{i}, \mathrm{j}] * \mathrm{~K}Y[i:i+h,j:j+w]+=X[i,j] K
where the size of Y is the size of the convolution and the calculation formula is reversed:
Convolution:out = ( I nput − kernel + 2 ∗ padding ) / stride + 1 out = (Input - kernel + 2*padding) / stride + 1out=(Inputkernel+2padding)/stride+1
反卷积: o u t = ( I n p u t − 1 ) ∗ s t r i d e + k e r n e l − 2 ∗ p a d d i n g out = (Input - 1) * stride + kernel - 2*padding out=(Input1)stride+kernel2padding

  • Stride
    Stride is the sliding step.
    The figure below is a 2x2 convolution kernel, transposed convolution when stride=2.
    insert image description here

  • Padding
    is different from regular convolution, where padding is applied to the output in transposed convolution (regular convolution applies padding to the input). For example, when the number of padding on both sides of the height and width is specified as 1, the first and last row and column will be removed from the output of the transposed convolution.

2.2 Unpooling

Anti-pooling is the inverse operation of pooling. Generally speaking, there are three methods of anti-pooling

  • Nearest Neighbor is to copy the same data four times to achieve the effect of four times enlargement. This method is also called anti-average pooling.
    insert image description here

  • Bed of Nails
    puts the data in the upper left corner of the corresponding position, and then fills the rest with 0
    insert image description here

  • MaxUnpooling
    requires recording the coordinate position of the maximum activation value during the pooling process, and then restoring the original size. During unpooling, only the coordinates of the position of the maximum activation value during the pooling process are activated, and the other values ​​are set to 0. Of course, this process is only an approximation. Because in the pooling process, except for the position of the maximum value, other values ​​are not all 0.
    insert image description here

2.3 Sub-pixel convolution (PixelShuffle)

ESPCN proposed the PixelShuffle algorithm for the first time. For details, see: Super-resolution algorithmESPCN: "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel"

The core idea of ​​​​subpixel convolution:一张图像放大r倍,就相当于每个像素都放大r倍。

The convolution process of the penultimate layer of the network 输出通道数is the feature image of r^2the original image and then periodically arranged through the sub-pixel convolution layer to obtain a reconstructed image with a size of .同样大小
(w × r , h × r)

insert image description here
As shown in the above figure, the 9 features framed by the red circle on the penultimate layer are arranged to form the small box on the last layer pointed by the arrow. This is the reconstruction block formed by the pixels framed in the original image through the network. These nine pixels just triple the length and width of the original pixel.
Sub-pixel convolution (Sub-pixel Convolution) does not actually have a convolution operation, but simply extracts features and then arranges them simply.

3. Application in SR

Several methods of upsampling in super-resolution tasks:

  1. Bicubic interpolation is used as the basis, and convolutional layers are used for fine-tuning correction. DCSCN
  2. The deconvolution layer uses pangding to expand the image. SRDenseNet
  3. Use a step size of 1 r \frac{1}{r}r1The deconvolution of magnifies the image. FSRCNN
  4. Sub-pixel convolution, an implicit convolution layer that does not require additional computation, builds the output by rearranging. ESPCN

4. Reference

[1] [Image processing] Detailed explanation of nearest neighbor interpolation, linear interpolation, bilinear interpolation, bicubic interpolation
[2] Summary of upsampling methods (interpolation and deep learning)

Guess you like

Origin blog.csdn.net/zyw2002/article/details/132198555