【Paper notes】—Low illumination image enhancement—Semi-Supervised—DRBN—2020-CVPR

【Introduction】

For the first time, a semi-supervised learning method is used for low-light image enhancement.

【题目】:From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement

【DOI】:10.1109/CVPR42600.2020.00313

【会议】:2020-CVPR
【作者】:Wenhan Yang(香港城市大学), Shiqi Wang(香港城市大学), Yuming Fang(江西财经大学), Yue Wang(字节), Jiaying Liu(北京大学)

【论文链接】:https://ieeexplore.ieee.org/document/9156559
【代码链接】:https://github.com/flyywh/CVPR-2020-Semi-Low-Light
【视频链接】:https://www.youtube.com/watch?v=J5ogMvSDdF4

【motivation】

Bridging the gap between signal fidelity and perceived quality.

【method】 

A deep recursive band network DRBN (Deep Recursive Band Network) is proposed. In DRBN, frequency band representation learning is first performed. Guided by paired datasets, each frequency band signal is learned to recover, this stage ensures signal fidelity and detail recovery. Then, under the perceptual guidance of the unpaired dataset, band recombination is performed to enhance the visual quality of the images, where the high-quality images serve as priors for human visual perception.

【Innovation】

  1. For the first time, a semi-supervised learning framework for low-light image enhancement is proposed.
  2. The proposed framework is well designed to extract a series of coarse-to-fine band representations. By end-to-end training in a recursive manner, the estimation of these frequency band representations is mutually beneficial, able to remove noise and correct details.
  3. Reorganization of Deep Band Representations via Perceptual Quality-Guided Adversarial Learning. The "real image" of the discriminator is an unpaired image perceptually selected based on the mean opinion score (MOS). (similar to EnlightenGAN)

[DRBN network structure]

The proposed framework of Deep Recurrent Band Network (DRBN) consists of two stages: recurrent band learning and band reconstruction.

In DRBN, frequency band representation learning is performed first. Guided by paired datasets, each frequency band signal is learned to recover, this stage ensures signal fidelity and detail recovery. Then, under the perceptual guidance of the unpaired dataset, band recombination is performed to enhance the visual quality of the images, where the high-quality images serve as priors for human visual perception.

The first stage: Recursive Band Learning. (Recursive paired supervised learning)

Purpose: To ensure signal fidelity and detail recovery. (fidelity)

  1. In each recursion, a series of coarse-to-fine (s1 to s3) band representations are learned, and different band signals are jointly inferred during the recursions, which are then incorporated into the augmented result.
  2. Residual learning is employed in both the feature and image domains.
  3. Recursive learning enhances modeling capabilities. The augmented result of the previous recursion is used as a guide for the next recursion, that is, the latter recursion recovers the residual signal only guided by the previous recursion estimate. Therefore, the latter recursion is more capable of modeling structural details and suppressing noise.
  4. The high-order Xs3 bands inferred from the previous recursion will affect the inference of the low-order Xs1 bands in this recursion. That is, the output of the previous recursion, is used as the bootstrap input for the next recursion, which concatenates all band estimates together in a joint estimation. Thus, the connection between the low-order and high-order bands is bidirectional, and the high-order bands also provide useful guidance for recovering the low-order bands.
  5. Recursive estimation enables different bands to learn to correct their estimates based on previous estimates for all bands.

A series of deep networks similar to U-Net are constructed, called band learning networks BLN (band learning networks). Each BLN projects the concatenation of the input Y and the augmented result of the last recursion into the feature space, and then transforms the features through several convolutional layers. In intermediate layers, the spatial resolution of features is first downsampled and then upsampled by strided convolution and deconvolution. There are skip connections (indicated in red) that connect features with the same spatial resolution from shallow to deep layers, which helps the local information contained in shallow-generated features to reach the output. Each BLN produces three features at scales S1=1/4, S2=1/2 and S3=1, respectively.

Example of the first cycle of recursive learning:

Among them, are the features of the corresponding scale extracted from y; is the correlation process;   is the process of projecting the features back to the image domain of the corresponding scale; is the upsampling process. The image is first reconstructed at the coarsest scale s1. Then, at a fine scale, the residual signal is predicted as part of the overall result. 

Afterwards, at iteration t, the residual features and images are learned only guided by the previous (t-1th) estimation results. The concatenation of y and previous estimates is taken as input:

This formulation tightly connects all band features, forming a joint optimization of all bands. At the final recursion T (set to 4 in this work), the reconstruction loss of the first stage: 

Among them is the downsampling process, given the scaling factor si; it is to downsample the ground truth x to the same size as s1 and s2. Φ ( ) computes the SSIM value of the input image; λ1 and λ2 are weighting parameters. 

The second stage: Band Recomposition. (GAN of unpaired data)

Objective: Reorganize frequency band representations to improve the perceptual quality of enhanced low-light images via perceptual quality-guided adversarial learning. (perceived quality)

Generator: The frequency band representation from the previous stage is fed forward into a U-Net-like network (generator), which is used to  F_{RC}(\cdot )model the reconstruction process to generate the coefficients {w1,w2, w3}, the transform coefficient linearly manipulates and fuses these frequency bands to obtain an enhanced image . As follows:

Discriminator D:

  1. Measures the probability of human visual preference .
  2. The AVA dataset of high-quality images selected based on MOS (Mean Opinion Score) values ​​[21] acts as a prior for human visual perception.

The loss function of the second stage: 

F_{P}(\cdot )It is the process of extracting deep features from the pre-trained VGG network.

【data set】

1. Paired low-light data set LOL: In the first stage, the image is cut to 256*256 for training, and in the second stage, the image is cut to 320*320 for training.

2. Unpaired high-quality image dataset AVA[21]: The Band Recomposition stage uses an adversarial learning (similar to EnlightenGAN) Global-Local mechanism to learn the AVA dataset, which has about 250,000 photos (per Zhang is very beautiful), and then calculate the loss function at the end.

Low-light image dataset_chaikeya's blog-CSDN blog_Low-light dataset

【Experimental Results】

Table 1: Quantitative results. The PSNR, SSIM and SSIM-GC values ​​of the method in this paper are the best. (SSIM calculated based on Gamma correction results, called SSIM-GC)

Figure 3: Qualitative results. Left: original result. Right: The result corrected by gamma transformation for better visibility. The method in this paper works best visually.

Guess you like

Origin blog.csdn.net/qq_39751352/article/details/126358043