AI practical training camp & underlying vision and MMEditing

One image super-resolution

  • Increase image resolution
  • High-scoring images match the content of low-scoring images
  • Restore image details and produce realistic content ( bilinear or bicubic interpolation cannot restore high-frequency details of the image )
  • Research related to super-resolution was first published in 1984, and the term super-resolution was first proposed in 1990

application

  • Save bandwidth for transmitting high-definition images
    Please add image description

Solutions to single-image overscores

  1. Learning the relationship between high and low resolution images based on known data: prior knowledge
  2. Restoring HD images subject to prior knowledge

SRCNN

The first super-resolution algorithm based on deep learning proves the feasibility of deep learning in underlying vision. The model consists of only three convolutional layers and can be learned end-to-end without additional pre- and post-processing steps. Please add image description
SRCNN first uses bicubic interpolation to enlarge the low-resolution image to the target size, then fits the nonlinear mapping through a three-layer convolutional network, and finally outputs the high-resolution image result. In this article, the author explains the structure of three-layer convolution into three steps: image patch extraction and feature representation, feature nonlinear mapping and final reconstruction.
Derived from sparse coding and expressed in the three steps described previously as follows:

  • Patch extraction: Extract image patches and perform convolution to extract features, similar to mapping image patches to low-resolution dictionaries in sparse coding.
  • Non-linear mapping: mapping low-resolution features to high-resolution features, similar to finding the high-resolution dictionary corresponding to the image patch in dictionary learning
  • Reconstruction: Image reconstruction based on high-resolution features. Similar to image reconstruction from high-resolution dictionaries in dictionary learning

FSRCNN

FSRCNN is improved on the basis of SRCNN for speed:

  1. Interpolation is not practical, and the convolution operation is completed directly on the low-resolution image to reduce the amount of calculations.
  2. Use a 1X1 convolution layer to compress the feature map channel to further reduce the amount of convolution operations.
  3. After several convolution layers, the image resolution is improved through transposed convolution. Please add image description
    FSRCNN is a fast image super-resolution convolutional neural network. It is improved on the basis of SRCNN. It mainly has the following aspects:
  • FSRCNN cancels the interpolation amplification operation of low-resolution images, but directly inputs the network and uses a learnable deconvolution layer at the network output for amplification and reconstruction. This can reduce the amount of calculations and parameters, and improve speed and quality.
  • FSRCNN uses a compression-expansion structure in the feature mapping stage, that is, first using a 1x1 convolution kernel to reduce the number of channels of the feature map, then using a multi-layer 3x3 convolution kernel for nonlinear mapping, and finally using a 1x1 convolution kernel Restore the channel number of the feature map. This can increase network depth and nonlinearity while reducing the amount of parameters and calculations.
  • FSRCNN uses PReLU as the activation function, which can avoid the gradient disappearance problem of ReLU and improve model performance.
  • FSRCNN proposes a transfer learning strategy that can quickly obtain a model of another scaling factor by fine-tuning the deconvolution layer based on the pre-trained model of one scaling factor.
  • FSRCNN has achieved better results than SRCNN on different test sets, and its speed has also been significantly improved.

Two-image super-resolution loss function

Please add image description
The loss function of image super-resolution is an important indicator used to evaluate the difference between the reconstructed image and the real image and guide model optimization. Different loss functions may lead to different performance of the model in terms of peak signal-to-noise ratio (PSNR) and perceptual quality.

Common image super-resolution loss functions include the following:

  • The mean square error (MSE) loss function is the simplest and most commonly used loss function. It directly calculates the pixel-level difference between the reconstructed image and the real image, and then averages it. The MSE loss function can enable the model to achieve higher PSNR, but it often results in the reconstructed image being too smooth and blurry, lacking high-frequency details and edge information.
  • The perceptual loss function is a loss function based on high-level features. It uses a pre-trained classification network (such as VGG) to extract the feature representation of the reconstructed image and the real image, and then calculates the distance between the features. The perceptual loss function can enable the model to generate clearer and natural reconstructed images, but some PSNR may be sacrificed.
  • The adversarial loss function is a loss function based on a generative adversarial network (GAN). It uses a discriminator network to distinguish the reconstructed image from the real image, and then calculates the probability that the reconstructed image is judged to be a real image. The adversarial loss function allows the model to generate more realistic and sharp reconstructed images, but may introduce some artifacts and noise.
  • The edge loss function is a loss function based on edge information. It uses some edge detection algorithms (such as Sobel) to extract edge features of the reconstructed image and the real image, and then calculates the difference between the edges. The edge loss function can enable the model to better restore the edge details of the reconstructed image and improve the sharpness and clarity.
  • The Fourier space loss function is a loss function based on frequency domain information. It uses Fourier transform to convert the reconstructed image and the real image from the spatial domain to the frequency domain, and then calculates the distance between the frequency domains. The Fourier space loss function can enable the model to better match the target frequency distribution and improve the perceptual quality and visual effects.
    Different loss functions have different advantages, disadvantages and applicable scenarios. In practical applications, you can choose an appropriate loss function or combine multiple loss functions according to your needs.

Guess you like

Origin blog.csdn.net/shengweiit/article/details/131215458