Detailed explanation of the paper - "Deep Color Consistent Network for Low-Light Image Enhancement"

Abstract

Low Light Image Enhancement (LLIE) studies how to refine lighting to obtain natural normal light images. Current LLIE methods mainly focus on improving illumination, but do not reasonably incorporate color information into the LLIE process to consider color consistency. Therefore, there is often a color difference between the augmented image and the ground truth.
To address this problem, we propose a new deep color consistency network, called DCC-Net, to preserve the color consistency of LLIE. A new "divide and conquer" synergistic strategy is proposed, which jointly preserves color information while enhancing illumination.
Specifically, our DCC-Net decoupling strategy decouples each color image into two main components, namely grayscale image and color histogram. Use grayscale images to generate reasonable structure and texture, and color histograms are good for maintaining color consistency. That is to use the two to cooperate to complete the LLIE task.
In order to match the color features and content features of the image and reduce the color consistency gap between the enhanced image and the real picture, we also design a new pyramid color embedding (PCE) module, which can better embed color information into the LLIE process middle.
Extensive experiments on 6 real datasets show that our DCC-Net augmented images are more natural and colorful, which outperforms the current state-of-the-art methods.

3. Proposed Method

In this section, we introduce the framework (see Figure 2) and details of DCC-Net, which aims to maintain color consistency and naturalness when acquiring normal-light images. DCC-Net has three subnets (i.e., G-Net, C-Net, R-Net) and a pyramidal color embedding (PCE) module.
insert image description here
The overall framework of DCC-Net. It can be seen that there are three subnets G-Net, C-Net and R-Net, where the goal of G-Net is to restore grayscale images with rich content information, the focus of C-Net is to learn color distribution, and R-Net will grayscale images Combines the intensity image and color information to restore natural and color-consistent normal-light images.

3.1 Network Structure

  • G-Net
    Given an input low-light image, the goal of G-Net is to predict the grayscale image of a normal-light image that contains rich structure and texture information, but does not contain color information. This process is expressed as
    G pre = GN et ( S low ) G_{\text {pre }}=GN et\left(S_{\text {low }}\right)Gpre =GN e t(Slow )
    whereG pre G_{pre}GpreFor predicting grayscale images, S low S_{low}SlowFor the input low-light image, GNet is the transformation of G-Net. Specifically, G-Net uses an encoder-decoder pipeline, which is similar to the classic U-Net.
    For G-Net, we use l1 loss to reconstruct the grayscale image:
    lg = 1 H × W ∥ G pre − G high ∥ 1 , l_g=\frac{1}{H \times W}\left\|G_{pre }-G_{\text {high }}\right\|_1,lg=H×W1GpreGhigh 1,
    where,lg l_glgReconstruction loss for grayscale images, G high G_{high}Ghighis the grayscale image of the normal light image, H and W are the grayscale image G high G_{high}Ghighheight and width. Therefore, G-Net does not consider color information, but instead works on recovering texture and structure.
  • C-Net
    color histogram is a color feature widely used in image retrieval systems [9]. The color histogram mainly describes the proportion of different colors in the whole image, and does not care about the spatial position of the color. In this paper, we compute the color histogram in the RGB color space. Specifically, the color histogram of the image should be a matrix of N × 256, where N = 3 corresponds to three color channels (R, G, B), and 256 is consistent with the range of pixel values.
    C-Net is designed based on color histogram for color feature learning. The goal of C-Net is to obtain color features consistent with normal light images (see Figure 2). We also leverage the encoder-decoder pipeline of C-Net, which converts the input low-light image into a predicted color histogram by the following formula:
    C pre = CN et ( S low ) C_{\text {pre }} =CN et\left(S_{\text {low }}\right)Cpre =CNet(Slow )
    whereC pre C_{pre}CpreFor the obtained color histogram, CNet is the calculation process of C-Net. In order to better reconstruct the color histogram, we also use l1 loss to constrain C-Net, which can be described as follows:
    lc = 1 N × 256 ∥ C pre − C high ∥ 1 , l_c=\frac{1}{N \times 256}\left\|C_{\text {pre }}-C_{\text {high }}\right\|_1 \text {, }lc=N×2561Cpre Chigh 1
    wherelc l_clcReconstruction loss for color histogram, C high C_{high}ChighA true color histogram for a normally lit image. Note that the color histogram cannot describe the content and details in the image. That is, C-Net puts all the attention on learning consistent color features, which is good for enhancement.
  • R-Net
    R-Net combines the grayscale images and color histograms obtained by G-Net and C-Net to synergistically restore normal-light images. Transform the input low-light image, predicted grayscale image and color histogram into a constant light image through R-Net, as follows:
    S pre = RN et ( S low , G pre , C pre ) S_{\text {pre }}=RN et\left(S_{\text {low }}, G_{\text {pre }}, C_{\text {pre }}\right)Spre =RNet(Slow ,Gpre ,Cpre )
    whereS pre S_{pre}Sprefor image enhancement.
    To reconstruct the normal light image at the pixel level, we use the color image reconstruction loss lr l_rlr,其定义如下:
    l r = 1 N × H × W ∥ S pre  − S high  ∥ 1 ,  l_r=\frac{1}{N \times H \times W}\left\|S_{\text {pre }}-S_{\text {high }}\right\|_1 \text {, } lr=N×H×W1Spre Shigh 1
    where,N , H , WN, H, WN , H , W respectively represent the normal light imageS high S_{high}ShighThe number of channels, height, width.
    At the structural level, we use the SSIM loss as a constraint:
    l ssim = 1 − SSIM ⁡ ( S pre , S high ) l_{\text {ssim }}=1-\operatorname{SSIM}\left(S_{\text { pre }}, S_{\text {high }}\right)lyes =1SSIM(Spre ,Shigh )
    In addition to the SSIM(·)definition SSIM
    ⁡ ( x , y ) = 2 µ x µ y + c 1 µ x 2 + µ y 2 + c 1 ⋅ 2 σ xy + c 2 σ x 2 + σ y 2 + c 2 \operatorname{SSIM}(x, y)=\frac{2 \mu_x \mu_y+c_1}{\mu_x^2+\mu_y^2+c_1} \cdot \frac{2 \sigma_{xy}+ c_2}{\sigma_x^2+\sigma_y^2+c_2}SSIM ( x ,y)=mx2+my2+c12 mxmy+c1px2+py2+c22 pxy+c2

where x , y ∈ RH × W × 3 x, y∈R^{H×W×3}x,yRH × W × 3 represents the two images to be tested,µ x , µ y ∈ R µ_x, µ_y∈Rmx, myR represents the mean of the two images,σ x , σ y ∈ R σ_x, σ_y ∈ Rpx, pyR is the variance corresponding to the two images,c 1 c_1c1and c 2 c_2c2Two constant arguments in case the denominator is zero.
In addition, the total variation loss ltv l_{tv} is also usedltvas a regularization term to preserve the smoothness of the augmented image.

3.2. Pyramid Color Embedding (PCE)

The PCE module is designed to better embed color information into R-Net, as shown in Figure 3. It can be seen that PCE has six pyramid-structured color embedding (CE) modules. CE achieves dynamic embedding of color features. The main component of CE is the dual affinity matrix (DMA) which solves the problem of information mismatch.
insert image description here
Figure 3: Detailed structure of the pyramidal color embedding (PCE) and color embedding (CE) modules, where ⊙ \odot⊙Element -wise multiplication, ⊕ means element-wise addition, ⊗ means upsampling operation.

  • Dual affinity matrix
    can obtain corresponding grayscale images and color histograms from G-Net and C-Net, which provide rich structure and texture details, as well as color information, respectively. R-Net applies these two methods to achieve better enhancement effects. Since color histograms do not contain spatial information, simply concatenating them together will result in inaccurate lighting in the enhanced image. In addition, simple stitching also results in a mismatch between color information and content, which may produce chromatic aberrations in the enhanced image.
    In order to solve the information mismatch problem and obtain better color information embedding, a new color embedding module is proposed, which can dynamically fuse color features into R-Net according to the affinity between color and content features middle. The purpose of the proposed dual affinity matrix (DAM) is to compute an affinity matrix to match color and content features, further preventing the enhanced image from producing inconsistent colors. Specifically, given the color feature CCC and content featuresFFF , the size isN × H × WN × H × WN×H×W , DAM first calculates the Manhattan distance of each location and the inner product of C and F, the formula is as follows:
    M ( x , y ) = − ∥ F ( x , y ) − C ( x , y ) ∥ 1 , P ( x , y ) = F ( x , y ) ⋅ C ( x , y ) , \begin{gathered} M(x, y)=-\|F(x, y)-C(x, y)\|_1, \\ P(x, y)=F(x, y) \cdot C(x, y), \end{gathered}M(x,y)=F(x,y)C(x,y)1,P(x,y)=F(x,y)C(x,y),
    其中 F ( x , y ) , C ( x , y ) ∈ R N F (x, y), C (x, y)∈R^N F(x,y)C(x,y)RN represents the vector of F, C at (x, y),M , P ∈ RH × WM, P ∈ R^{H×W}M,PRH × W is the Manhattan distance matrix and the inner product matrix. Then, the dual affinity matrix A (dual affinity matrix) can be calculated as:
    A = 2 × sigmoid ⁡ ( M ) ⊙ tanh ⁡ ( P ) , A=2 \times \operatorname{sigmoid}(M) \odot \tanh ( P),A=2×sigmoid(M)tanh ( P ) ,
    wheretanh ( ⋅ ) tanh(·)t you ( _) s i g m o i d ( ⋅ ) sigmoid(·) sigmoid() are tanh function and sigmoid function respectively. Note that for each position( x , y ) (x, y)(x,y) M ( x , y ) ≤ 0 M(x, y)≤0 M(x,y)0 , so thatsigmod ( M ) ∈ [ 0 , 0.5 ] sigmod (M) ∈ [0,0.5]sigmod(M)[0,0.5 ] . Therefore, we use2 × sigmoid ( M ) 2 × sigmoid (M)2×s i g m o i d ( M ) to guaranteeA ∈ [ 0 , 1 ] A ∈ [0,1]A[0,1 ] range.

  • Color embedding
    CE obtains the dynamic embedding of color information, and its structure is shown in Figure 3. After CE obtains the dual affinity matrix A, it multiplies A and the color feature C element-by-element, and sums the weighted color feature and the content feature F to obtain the color information of the embedded feature: E = A ⊙ C + FE=A\
    odot C+FE=AC+F
    where E is the output feature used in the R-Net decoder. There is also an upsampling operation of the color feature C to change its resolution, which is then further input into the next CE as the original color feature.

  • Pyramid structure
    Given color features, we can leverage them to guide the augmentation process to achieve consistent colors. In order to fully explore the color information, we give a PCE containing 6 pyramid-structured CEs (as shown in Fig. 3). Given the color features C i C_i of the i-th CEsCiand content features F i F_iFi, PCE features from shallow to deep in each layer are described as follows:
    E i , C i + 1 = CE ( F i , C i ) , i = 1 , 2 , ⋯ , 6 E_i, C_{i+1}=CE \left(F_i, C_i\right), i=1,2, \cdots, 6Ei,Ci+1=CE(Fi,Ci),i=1,2,,E i E_i
    in 6EiRepresents the output feature, CE ( ⋅ ) CE(·)CE() represents the transformation of CE. C i C_iCinumber ( i − 1 ) (i−1)(i1 ) CE calculation. In contrast, in the encoder of R-Net,F i F_iFiis copied from the corresponding layer. The pyramid structure embeds color features into six layers. In other words, progressive design can take full advantage of color information. Therefore, the enhanced image color will be more consistent.

3.3. Objective Function

The target function of our DCC-Net is described as:
l total = λ glg + λ clc + λ rlr + λ ssimlssim + λ tvltv l_{\text {total }}=\lambda_g l_g+\lambda_c l_c+\lambda_r l_r+\lambda_{ ssim} l_{ssim}+\lambda_{tv} l_{tv}ltotal =lglg+lclc+lrlr+lss imlss im+ltvltv
λ g , λ c , λ r , λ ssim , λ tv λ_g, λ_c, λ_r, λ_{ssim},λ_{tv}lg,lc,lr,lss im,ltvare several trade-off parameters. Among them, lg l_glgand lc l_clcare used to restore grayscale images and color histograms, respectively. Use L r L_rLrlssim l_{ssim}lss imPixel-level and structure-level reconstruction of standard light images. In order to prevent overfitting and maintain smoothness, L tv L_{tv}Ltvas a regularization term.

Guess you like

Origin blog.csdn.net/zyw2002/article/details/132448149