[Computer Vision] Application of Recursive Neural Network in Image Super ResolutionDeep Recursive Residual Network for Image Super Resolution

DRCN: Deeply-Recursive Convolutional Network for Image Super-Resolution

Summarize

This article is the first time to apply the previously existing Recursive Neural Network structure to image super-resolution. In order to increase the receptive field of the network and improve the network performance, a deep recursive neural network was introduced. Recursive module weight sharing reduced the number of parameters required for the model, but the gradient explosion/disappearance problem occurred. Recursive supervision and skip connection were also studied . Extension method.
Features:

  1. Recursive supervision: The feature map after each recursion is used to reconstruct the target high-resolution image HR. Since each recursion results in a different HR prediction, the authors combine all predictions produced by different levels of recursion to provide a more accurate final prediction. The predictions of each recursion are supervised by GT
  2. Skip connections: In SR, low-resolution images (input) and high-resolution images (output) share the same information to a large extent. The precise information of the input may decay during many forward passes. So the author connects the input and the output of each layer to the reconstruction layer for image restoration. Alleviating the memory burden of the network, the network only needs to learn the residual
  3. The original LR image is upsampled by interpolation and the resolution is increased before entering the network.

method details

Basic model:

Insert image description here
The base model consists of three sub-networks: embedding netword, inference network and reconstruction network.

  • Embedding network is used to represent a given image as a feature map,
  • The inference network deepens the network depth and maps the output features of the embedded network to higher dimensions. Note that the inference network is called recursively.
  • The reconstruction network generates output images based on the final feature maps in the inference network.
  • The input image in the above figure is the image after interpolation upsampling of the original LR image.

There is a problem:

  • Vanishing and exploding gradients. Exploding gradients are caused by the multiplicative nature of chained gradients. For deep recursion, this can grow exponentially. The vanishing gradient problem is the opposite of the exploding gradient problem. The gradient accelerates exponentially to the zero vector. Therefore, the existence of gradient explosion and disappearance makes it very difficult for deep recurrent networks to grasp the relationship between distant pixel information.
  • It is not easy to retain the original LR information after multiple recursions. In the SR task, the output is very similar to the input, so the information of the LR image is very important and the precise information of the input image needs to be preserved for subsequent deeper recursive layers.
  • There is a problem of finding the optimal number of recursions. If the recursion is too deep for a given task, the number of recursions needs to be reduced. Finding the optimal number requires training many networks with different recursion depths.

Advanced model

Insert image description here
Figure a is the final model of the article:

  • Recursive-Supervision: Supervises each layer of recursion to mitigate the effects of vanishing/exploding gradients. It is assumed that the convolution process in the inference layer repeatedly uses the same convolution kernel, and the same reconstruction layer is used to predict the SR image recursively reconstructed for each time. The reconstruction layer outputs D predicted images, and all predictions are supervised at the same time during training. This step is reflected in the formula by adding a part of loss, which will be introduced in detail in the formula below.
      a. Calculate the final output through weighted summation of all D predicted images (the weights are learned by the network). Recursion eases the difficulty of training recursive networks, and backpropagation provides a smoothing effect by summing the backpropagation gradients generated by different prediction losses, which can effectively alleviate gradient explosion or disappearance.
      b. Furthermore, since supervision is able to exploit the predictive information of all intermediate layers, the importance of choosing the optimal number of recursions is reduced.

  • Skip-Connection: For image reconstruction tasks, the input and output images are highly correlated, so the LR information can be directly transferred to the SR reconstruction layer through skip layer connections. This approach has two advantages: it saves the complex computing power of long-distance transmission and retains complete low-frequency information to a great extent.

Reference article

Super-resolution algorithm DRCN: Deeply-Recursive Convolutional Network for Image Super-Resolution super-resolution reconstruction

DRRN: Image Super-Resolution via Deep Recursive Residual

Summarize

The author pointed out the disadvantages of interpolation processing before the image enters the network (increased calculation amount), and mentioned the Sub-Pixel strategy used in the ESPCN super-resolution network. This method improves the resolution at the end of the network, thereby reducing the It reduces the amount of computation and alleviates the memory burden on the network.

Introducing global and local residual learning. Similar to ResNet, the residual unit (which also means the introduction of local residual learning) is introduced, so that residual learning will be performed once the network runs for several layers (this is more conducive to the reconstruction and maintenance of high-frequency information during network operation) ), and a large residual learning will also be performed in the final output layer.

method

Insert image description here
Insert image description here

residual recursive block

Insert image description here

performance

Insert image description here
Insert image description here

LapSRN: Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution

Summarize

Using the idea of ​​Laplacian pyramid structure, we gradually carry out super-resolution to the power of 2, and combine the residuals to make model predictions.
The innovative points of the paper include:

  1. High accuracy. LapSRN uses Charbonnier loss to improve reconstruction quality and reduce artifacts.
  2. high speed. Like FSRCNN, it can achieve real-time super score on most test sets.
  3. Rebuild step by step. The model uses the structure of Laplacian pyramid to generate multiple intermediate results in one prediction process, which can be used as super-resolution results of different multiples.

Problems with the previous method:

  1. Some methods use predefined upsampling operations to interpolate low-resolution inputs to high-resolution outputs for superresolution. This increases unnecessary computational cost and often results in visible reconstruction artifacts. Although FSRCNN uses deconvolution and ESPCN uses sub-pixel convolution to accelerate SRCNN-like methods, the input resolution is too high, which limits the capacity of the model structure and cannot learn more complex mapping relationships.
  2. L2 Loss is sensitive to noise. The article says that the image generated by L2 loss is too smooth and inconsistent with human eye perception.
  3. Most methods only use one upsampling to obtain the final image, which is difficult for 8x
  4. Existing methods cannot generate intermediate processes and require training a wide variety of models for various applications with different desired upsampling ratios and computational loads.

method

Comparison of similarities and differences in methods

Insert image description here

Existing methods

Insert image description here

Methods of this article

  • Green represents element-level addition operations, orange represents recursive convolution layers, and blue represents transposed convolutions.
  • two modules
    • Feature extraction: d-layer convolution (extracting [level s] resolution l features) + 1-layer deconvolution (upsampling to [level s+1] finer resolution features), and then divided into two branches Path: one reconstructs the residual image through the convolution layer, and the other recursively calls d-layer convolution to continue extracting [level s+1] resolution l features. Since the previous d-layer convolution operates at low resolution , the amount of calculation is greatly reduced compared to the method of upsampling the image to high resolution before inputting to the network .
    • Upsampling: Transposed convolution (deconvolution) with a scale of 2 upsamples the image + the residual image, and adds the corresponding position elements
      Insert image description here

Recursive network model framework

Insert image description here

loss: Charbonnier

l2 loss problem:

  • y ^ s \hat{y}_s y^sis the HR image of level s predicted by the network
  • ys y_sysis the HR image of level s of GT, obtained from the original high-resolution image through bicubic downsampling.
  • x s x_s xsis an image upsampled from a low-resolution image
  • r s r_s rsis the residual image of level s

The article says that the ideal output high-resolution image is modeled by this formula, ys = xs + rs y_s=x_s+r_sys=xs+rs.Actually,
my understanding is, xs x_sxsand rs r_srsThey are all predicted by the network. It should be the HR image of level s predicted by the network y ^ s = xs + rs \hat{y}_s=x_s+r_sy^s=xs+rs
所以loss应该写成:
ζ ( y ^ , y ; θ ) = 1 N ∑ i = 1 N ∑ s = 1 L ρ ( y s ( i ) − y ^ s ( i ) ) = 1 N ∑ i = 1 N ∑ s = 1 L ρ ( y s ( i ) − ( x s ( i ) + r s ( i ) ) ) \zeta(\hat{y},y;\theta)=\frac{1}{N}\sum_{i=1}^{N}\sum_{s=1}^{L}\rho(y_s^{(i)}-\hat{y}_s^{(i)})=\frac{1}{N}\sum_{i=1}^{N}\sum_{s=1}^{L}\rho(y_s^{(i)}-(x_s^{(i)}+r_s^{(i)})) g (y^,y;i )=N1i=1Ns=1Lp ( ys(i)y^s(i))=N1i=1Ns=1Lp ( ys(i)(xs(i)+rs(i)))
ρ(x)=sqrt(x22) is the Charbonnier penalty function (a variant of the L1 norm), N is the number of each batch, and L is the number of pyramid levels. Under this loss function, the output of each level of the pyramid will be close to the HR at a certain scale, so 2x, 4x, and 8x super-resolution can be achieved at the same time.

Charbonnier loss (also known as L1-Charbonnier loss or Huber loss) is a loss function used in computational image reconstruction or image generation tasks. It was proposed by Charbonnier in 1989 to solve smoothing and noise problems in image processing.
Charbonnier loss is mainly used to replace the traditional L2 loss (squared difference loss), because the L2 loss is more sensitive to outliers (abnormal values), because the square will amplify the error, which can easily cause the model to over-smooth the noise or details . In contrast, Charbonnier loss has a relatively small response to outliers and can better retain detailed information.
The calculation formula of Charbonnier loss is as follows:
L Charbonnier ( x , y ) = ( x − y ) 2 + ϵ 2 L_{\text{Charbonnier}}(x, y) = \sqrt{(x - y)^2 + \ epsilon^2}LCharbonnier(x,y)=(xy)2+ϵ2
Among them, xxx is the output generated by the model,yyy is the real target value,ϵ \epsilonϵ is a small positive number (usually takes a smaller value, such as1 0 − 3 10^{-3}103 or1 0 − 6 10^{-6}106 ), used to avoid division by zero.
Charbonnier loss calculates the difference between the predicted value and the target value, and adds a square term and a smoothing term to maintain smoothness while having a certain tolerance for outliers. By minimizing Charbonnier loss, the model can better adapt to noise, preserve details, and generate clearer and more realistic images.
It should be noted that Charbonnier loss is only one choice of loss function. Depending on the specific tasks and requirements, other types of loss functions can also be selected to optimize the model.

performance

Insert image description here

ablation experiment

Insert image description here

It can be found that Pyramid and Robust loss have a more significant impact on performance. Although the residual impact on performance is not that significant, it can make the network converge faster and the loss will be less jittery.

Whether a design is useful can not only compare performance, but also look at: complexity, convergence speed, training difficulty, loss jitter, additional benefits, etc.

Reference article

Classic Review: LapSRN
[Super Score][CVPR2017]LapSRN

Guess you like

Origin blog.csdn.net/weixin_43693967/article/details/132422096