[Paper Notes]—Low Light Image Enhancement—Zero-reference—ZeroDCE—2020-CVPR

Original link: Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement_And Guangi's Blog-CSDN Blog

【Introduction】

zero reference (unsupervised, requires training, but does not require paired/unpaired data)

【题目】:Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement

【DOI】:10.1109/CVPR42600.2020.00185

【时间】:2020-01-19上传于arXiv
【会议】:2020-CVPR
【作者】:Chunle Guo(天津大学), Chongyi Li(天津大学), Jichang Guo, Chen Change Loy, Junhui Hou, Sam Kwong, Runmin Cong

【paper】:https://arxiv.org/abs/2001.06826
【project】:https://li-chongyi.github.io/Proj_Zero-DCE.html
【code_Pytorch】:https://github.com/Li-Chongyi/Zero-DCE
【code_TensorFlow】:https://github.com/tuvovan/Zero_DCE_TF

【Ask a question】

The main data-driven methods for low-light enhancement are : CNN-based and GAN-based.

  1. CNN-based: The vast majority use paired data (low-light, normal-light images) for training, usually synthesized by changing camera settings or with image retouching. This kind of data set is usually obtained through manual collection or artificial synthesis, and the generalization ability of the model trained with this kind of data set is not good.
  2. GAN-based: Unsupervised GAN networks can avoid the use of paired data, but these data also need to be carefully selected.

【solution】

This article proposes a light-weight deep network method to solve the Low-Light image enhancement problem. It transforms this task into an image-specific curve estimation problem (image as input, curve as output), such curve pairs are adjusted pixel-level within the dynamic range of the input to obtain enhanced images. By setting a series of non-reference loss functions (which can indirectly reflect the quality of enhancement), the author enables the network to perform end-to-end training without any reference image.

【Innovation】

  1. Explored a new learning strategy (zero reference) that eliminates the need for paired/unpaired data;
  2. The author was inspired by Curve Adjustment. Using CNN to learn Curve instead of directly learning output images is not only a new idea, but also makes the network very lightweight.
  3. Set a non-reference loss function to evaluate the output image indirectly.
  4. The proposed method is high efficient and cost efficient.
    This is due to: zero reference learning framework + lightweight network structure + effective non-reference loss functions.

【Network structure】

Learning a set of best-fitting enhancement curves, the framework iteratively applies Curve to map all pixels in the RGB channel of the input image to obtain the final enhanced image.

Light-Enhancement Curve light enhancement curve:

The author tries to design a class of curves that can automatically map low-light images to enhanced images. The curve parameters are self-adaptive and only depend on the input image. There are three requirements for designing such a curve:

  1. The pixel values ​​of the enhanced image are normalized to [0,1], which avoids information loss due to overflow truncation;
  2. The designed curve should be monotonic so as to preserve the difference (contrast) between adjacent pixels;
  3. The curve should be as simple as possible so that it is differentiable during gradient backpropagation.

In order to meet the above three requirements, the author designed a quadratic curve, the initial simple version is as equation 1:

Among them, x is the pixel coordinate, LE(I(x); α) is the enhancement result of the input image I(x), α∈[-1,1] is the trainable curve parameter (modify the size of the curve and control the exposure ). Each pixel is normalized to [0,1], and all operations are pixel-wise. When used, LE-Curve is applied separately to the RGB channels of the input, which better preserves intrinsic colors and avoids overfitting.

Under different α parameter settings, the image is shown in Figure 2(b) above, and it can be seen that the designed curve can well meet the above three requirements. In addition, LE-Curve can also increase/decrease the dynamic range of the input image, which not only enhances low-light areas, but also avoids overexposure.

Higher-Order Curve high-order curve:

Introduce the idea of ​​"iterative optimization" to the LE-Curve defined by the above formula (1), which is the Higher-Order LE-Curve:

n represents the number of iterations, and the author found that the performance of n=8 is good enough. When n is 1, formula (2) degenerates into (1). An example of high-order Curve is provided in Figure 2(c) above, and it can be seen that it has a stronger adjustment ability (larger curvature) than the image in Figure 2(b).

Pixel-Wise Curve:

The high-order curves mentioned in equations 1 and 2 can adjust the image in a wider dynamic range, but since α is applied to all pixels, it is still a global adjustment, ( αn for pixels at different positions and different brightness all the same). This global matching will lead to over-/under-enhance the local area, so to refine the global adjustment to local adjustment, the author redefines α as a pixel-wise parameter (ie. each pixel of the input image has its corresponding curve):

Among them, Αn is a parameter map (consistent with the input image dimension), the author assumes that the pixels in the local area have the same intensity (also have the same adjustment curve, α is consistent), so the adjacent pixels in the output result still maintain a monotonic relationship, Therefore, the pixel-wise high-order curve (Formula 3) also meets the three requirements of the design.

Figure 3 is an example of the estimated curve parameter maps of the three channels. It can be seen that the best-fitting parameter maps of different channels have similar adjustment trends, but the values ​​are different, which can represent the correlation between the three channels of the low-light image. and difference. The parameter map of the curve can accurately represent the brightness of different areas (such as two bright spots on the wall), so image enhancement can be performed directly through pixel-wise curve mapping, as shown in Figure 3(e), bright areas are preserved, dark Area enhancements.

DCE-Net:

In order to learn the mapping relationship between the input image and the above best-fitting curve parameter map, the author used Deep Curve Estimation Network (DCE-Net), the input is a low-light image, and the output is a set of pixels for high-order curves -wise curve parameter maps. The CNN constructed in the paper consists of 7 convolutional layers with a symmetrical structure (similar to U-Net). The convolution kernel of the first 6 layers is (3x3x32, stride=1) and then connected to a ReLU layer, discarding down-sampling and The bn layer (the author believes that this will destroy the relationship between the domain pixels), the last layer of convolution channels is 24 (for parameter maps of 8 iterations) , followed by a Tanh activation function.

The parameter quantity of the whole network is 79,416, Flops is 5.21G (input is 256x256x3).

【Loss function】

In order to make the training process of the model zero-reference, the author proposes a series of non-reference loss to evaluate the quality of the enhanced image.

Spatial Consistency Loss:

Lspa is able to maintain the neighborhood difference (contrast) between the input image and its augmented version, thus promoting the spatial consistency of the augmented image.

Among them, K is the number of local regions, Ω(i) is the four adjacent regions (top, down, left, right) centered on region i, Y and I are the average intensity of local regions of the enhanced image and the input image, respectively value. The size of this local area is empirically set to 4x4. If it is other sizes, the loss will become stable.

Exposure Control Loss:

In order to control the under-/over-exposed area, Lexp is designed to control the exposure level, which can measure the gap between the average intensity of the local area and the well-exposedness Level E. The author follows the existing practice and sets E as the gray level in the RGB color space, which is set to 0.6 in this experiment (and the author mentions that there is basically no performance difference between E [0.4,0.7]).

Among them, M is the number of non-overlapping local regions, the region Size is 16x16, and Y is the average pixel intensity value of the local region in the enhanced image.

Color Constancy Loss:

According to the Gray-World color identity assumption, Lcol is designed to correct the potential color cast in the enhanced image, and also establishes the relationship between the three adjustment channels.

where Jp represents the average intensity of channel p of the enhanced image, and (p, q) represents a pair of channels.

Illumination Smoothness Loss:

In order to maintain the monotonic relationship between adjacent pixels, a smoothness loss is added to each curve parameter map A.

Among them, N is the number of iterations, ▽x, ▽y represent the gradient operation in the horizontal and vertical directions, respectively.

Total Loss:

Among them, Wcol and WtvA are the weight of Loss (there is also weight before Exposure control loss in the source code).

【data set】

SICE、NPE、LIME、MEF、DICM、VV

【Experimental Results】

In order to give full play to the wide dynamic range adjustment capability of Zero-DCE, the training set combines low-light and over-exposed images (Part 1 of SICE data set, 3022 images with different exposure levels, of which 2422 pictures are used for training), The image size is 512x512.

From Figure 4, it can be seen that removing Lspa will lead to lower contrast (such as the cloud area); removing Lexp will cause underexposure in low-brightness areas; removing Lcol will cause serious color cast; removing LtvA will reduce the neighborhood The correlation between them leads to obvious artifacts.

Compared with the current SOAT method on multiple data sets (NPE LIME MEF DICM VV and Part2 of SICE). (where f is the unsupervised method)

Guess you like

Origin blog.csdn.net/qq_39751352/article/details/126463224