Low-light image enhancement based on multi-exposure image generation

Source: Robot

Authors: Guan Yu, Chen Xi'ai, Tian Jiandong, Tang Yandong

Summary

Low-light images will reduce the robustness of many computer vision algorithms, seriously affecting many vision tasks in the field of robotics, such as autonomous driving, image recognition, and object tracking. In order to obtain enhanced images with more detailed information and larger dynamic range, a low-light image enhancement method based on multi-exposure image generation was proposed. By analyzing the real multi-exposure images, this method finds that there is a linear relationship between the pixel values ​​of images with different exposure times, so that the idea of ​​orthogonal decomposition can be applied to the generation of multi-exposure images. The multi-exposure image is generated according to the physical imaging mechanism, which is closer to the real shot image. After decomposing the original image to obtain illumination invariants and illumination components, an adaptive algorithm is designed to generate different illumination components, and then synthesized with illumination invariants to obtain a multi-exposure image. Finally, the multi-exposure image fusion method is used to obtain the enhanced image with a larger dynamic range. The fusion result is consistent with the input image, and the final enhanced image can effectively retain the color of the original image with high naturalness. Experiments were carried out on the public dataset of real low-light images and compared with existing advanced algorithms. The results show that the structural similarity between the enhanced image obtained by this method and the reference image is increased by 2.1%, and the feature similarity is improved. By 4.6%, the enhanced image is closer to the reference image and more natural.

Key words

Multi-exposure image, low-light image enhancement, light decomposition, image fusion

introduction

Images taken under low-light conditions or when the camera does not have enough exposure time are called low-light images. Low-illumination images usually have the characteristics of low brightness, low contrast, and ambiguous structural information, which brings difficulties to many robot vision tasks, such as face recognition in low-illumination images, target tracking [1], automatic driving [2], feature extract [3] and other tasks. The enhancement method of low-light images can not only improve the visual effect of the image, but also improve the robustness of the subsequent robot vision task algorithm, which has important practical application value.

According to whether it needs to rely on a large amount of data for training, the existing low-light image enhancement algorithms can be divided into two categories: traditional methods and methods based on deep learning. In the traditional low-light image enhancement method, the histogram-based method [4-5] improves the contrast of the image by adjusting the histogram of the image to achieve the purpose of enhancing the low-light image. This type of method is simple and efficient, but lacks a physical mechanism , often lead to over-enhancement or under-enhancement of the image, and the noise of the image will be significantly amplified. The method based on Retinex theory [6] first decomposes the image to obtain the illumination component and reflection component, and then enhances them respectively. Wang et al. [7] designed a low-pass filter to decompose the image into reflection image and illumination image, and used a double-logarithmic transformation to enhance the image to balance the naturalness and image details. Guo et al. [8] first took the maximum value of the RGB three channels of the low-illumination image to obtain an initial illumination image, then corrected the initial illumination by using the structure prior information, adjusted the brightness of the image using the gamma correction method, and then adjusted the adjusted illumination The image is combined with the reflected image to obtain the final enhanced result. Ren et al. [9] proposed a noise-suppressed sequence evaluation model to separately estimate the illumination component and the reflection component. In this noise-suppressed sequence decomposition process, each component is spatially smoothed and the weight matrix is ​​cleverly used to suppress Noise and improve contrast, and finally the estimated reflection component is combined with the gamma-corrected illumination component to obtain an enhanced image, and finally the purpose of low-illuminance enhancement and joint denoising is achieved.

The methods based on deep learning [10-13] have achieved good results in low-light image enhancement through training on a large amount of data. Lore et al. [10] first proposed a deep autoencoder for contrast enhancement and noise removal to enhance low-light images. Wei et al. [11] combined the Retinex model and deep neural network for low-light image enhancement. Jiang et al. [12] implemented a low-light image enhancement model using a generative adversarial network, which does not require paired training data for training. Guo et al. [13] converted the low-light image enhancement task to a specific curve estimation task with a deep network. The training process of these deep learning-based methods usually consumes a lot of time and computing resources. And the effect of these methods depends to a large extent on the training data, inaccurate reference images will affect the training results, for example, in the real normal illumination image, due to uneven illumination, there may be overexposure of local high-light areas or local low-light areas. The problem of underexposure of illuminated areas.

In low-light image enhancement, uneven illumination is also a problem to be solved. For local low-light images, increasing the brightness of the image too high will lead to overexposure of the high-light area of ​​the image, while insufficient brightness improvement will not be able to display the image details in the low-light area. Thanks to the advancement of photographic equipment, it is possible to fix the photographic equipment and acquire images with different exposure times in a short period of time, and fuse a set of captured images to obtain an image with a larger dynamic range. Wang et al. [14] designed a smooth multi-scale exposure fusion algorithm based on edge information preservation in the YUV color space, which can preserve the details in the high-light area and low-light area of ​​the scene at the same time, in order to make up for the lost detail information in the fusion process , a vector field construction algorithm is designed to extract visible image details from the vector field, and this method can avoid color distortion in the process of image fusion. Although the image fusion method can effectively improve the dynamic range of the image, it needs to obtain a set of images with different exposure times in advance, and cannot enhance a single low-light image. Shooting dynamic scenes or camera shake while shooting can make it difficult to align the captured images, which can lead to artifacts in the fusion results.

In order to apply the method of image fusion to low-light image enhancement to improve the dynamic range of the image, it is necessary to generate a set of information for fusion based on a single image. At present, there are some methods that use the idea of ​​image fusion for low-light image enhancement. Among them, Fu et al. [15] first decomposed the image to obtain the illumination image and reflection image through a morphological closure-based illumination estimation algorithm, and then used the Sigmoid function and adaptive histogram equalization algorithm to process the illumination image to obtain brightness enhancement. After the illumination image and the contrast-enhanced illumination image, the two enhanced illumination images are fused and synthesized with the reflection image to obtain the final enhanced image. Cai et al. [16] collected 589 sets of multi-exposure images, and used 13 existing methods to fuse the multi-exposure images, selected the best result as a reference image, and designed a convolutional neural network for training on this data set. The end result is a low-light image intensifier. The single low-light image enhancement method based on image fusion effectively solves the problem that image fusion requires multiple exposure images as input images, but the methods of Fu et al. [15] and Cai et al. [16] still have the problem of lack of physical mechanism.

Aiming at the problems existing in the current methods, this paper proposes a low-light image enhancement method based on multi-exposure image generation. First, starting from the physical imaging mechanism, the relationship between exposure images is analyzed, and it is found that there is a similar relationship between images with different exposure times as between shadow and non-shadow images. Based on this, the orthogonal decomposition method [17] was first proposed for multi-exposure image fusion, that is, the orthogonal decomposition method is used to decompose the image to obtain the illumination components and illumination invariants, and images with different exposure durations are generated by changing the illumination components. The image fusion method is used to fuse the generated images to obtain an image with a high dynamic range. Since the generated image is closer to the real captured image, the naturalness of the enhanced image obtained by fusion is also well maintained. At the same time, the multi-exposure image generated from a single image corresponds pixel by pixel, and there is no artifact in the fusion result, which also solves the problem that the camera needs to be fixed when shooting multi-exposure images. Moreover, the method in this paper does not need to rely on a large amount of data for training, and has good versatility.

2 Generation of multi-exposure images and enhancement of low-light images

The method in this paper mainly includes three parts: (1) Image orthogonal decomposition. Decompose the original image to obtain a lighting component and a lighting invariant. (2) Multi-exposure image generation. Generate multi-exposure illumination components by changing the size of illumination components, and synthesize them with original illumination invariants to obtain multi-exposure images; (3) Multi-exposure image fusion. The multi-exposure images are fused to obtain the final enhanced image. Figure 1 shows the algorithm framework of this paper.

picture
Figure 1 Algorithm framework

2.1     Linear relationship between multi-exposure images

Through experiments, it is found that there is a linear relationship between images of different exposures taken under the condition of fixed lighting and camera parameters, as shown in Figure 2, Figure 2(a) is a set of color palette images taken under different exposure times , Figure 2(b) shows the real pixel values ​​and fitting lines of the 24 colors in the RGB three-channel color palette under different exposures, the green circle represents the real pixel value, the red solid line represents the fitting line, and the pixel value All are the RGB pixel values ​​before gamma correction, the abscissa represents the pixel values ​​of the first 4 swatch images, and the ordinate represents the pixel values ​​of the fifth swatch image.

picture
Figure 2 The linear relationship between the pixels of the real multi-exposure image

The linear relationship shown in Figure 2 can be expressed as

picture

(1)    

Among them, E and e represent the pixel values ​​of long exposure time and short exposure time respectively, H represents the three channels of R, G, and B, L represents the situation before gamma correction, and KH is the ratio between pixels under different exposures. Since the fitting values ​​of KH of the three channels are similar, in this paper, KR=KG=KB is set.

2.2   Image Orthogonal Decomposition

For formula (1) in

picture

After gamma correction can be obtained:

picture

(2)    

This has a similar form of expression to the RGB three-channel relationship [18] exhibited by pixels in shaded and non-shaded areas of the same object.

Similar to that in [17], it can be obtained from formula (1):

picture

 (3)    

in

picture

C can be calculated from pixel values.

For any pixel in the image, the pixel value is

picture

, from formula (3) can get:

picture

 (4)    

For the first time, our research group proposed an orthogonal decomposition method in [17], as shown in formula (5):

picture

 (5)    

Where u0 is the free solution of equation (4), satisfying Au0=0, ∥u0∥=1, up is the only special solution of equation (4) satisfying up⊥u0. The up and α corresponding to any pixel of the image can be calculated from the pixel value of the image. Figure 3 shows a schematic diagram of the orthogonal decomposition in the shadow image and in the multi-exposure image. Through linear algebra, it can be seen that the free solution u0 is only related to KH: in the shadow image, KH is a parameter that is only affected by the lighting conditions; in the multi-exposure image, KH is a parameter that is only affected by the incident light at different exposure times. The special solution up is perpendicular to the free solution u0, which means that the special solution up and the free solution u0 are independent and orthogonal to each other, that is, the special solution up has the property of illumination invariance. This means that, for a given pixel, no matter what exposure time the pixel is photographed at, after orthogonally decomposing the pixel value of the point, we can obtain a The only special solution up affected by incident light, and the size of α reflects the change of illumination. The up of the entire image constitutes the illumination invariant, and the α of the entire image constitutes the illumination component.

picture

Fig.3 Schematic diagram of orthogonal decomposition in shadow image and multi-exposure image

2.3   Adaptive generation of multi-exposure images

After obtaining the illumination invariant and illumination components of the original image, the illumination components with different exposure times can be obtained by enhancing or weakening the illumination components:

picture

 (6)    

Among them, Δα is the illumination increment. By controlling the size of Δα, the illumination components of different exposure durations can be obtained. Then use formula (7) to generate images with different exposure times:

picture

 (7)    

Finally, transform u' according to formula (8) to obtain pixel values ​​in the corresponding RGB space, where 13×1 represents a 3×1-dimensional matrix of all 1s.

picture

 (8)    

Fig. 4 shows a group of real-shot multi-exposure images and multi-exposure images generated by controlling the size of Δα. The generated images are all generated by using the real-shot Fig. 1 as the original image. It can be seen from the figure that the difference between the generated image and the real shot image increases with the increase of the brightness difference between the generated image and the original image.

picture

Figure 4 A group of real-shot multi-exposure images and generated multi-exposure images

In order to automatically generate multi-exposure images with different exposure times, an algorithm for adaptively generating multi-exposure images is designed, which can automatically select the size of the illumination increment Δα according to the brightness of the original image, and generate N images with different exposure times.

The brightness of any pixel in the image (pixel value v=[vR, vG, vB] T is defined as follows:

picture

 (9)    

The brightness L of the entire image is as follows:

picture

 (10)    

where p represents the total number of pixels in the entire image.

Note that the illumination increment corresponding to the N generated images is Δαi, i=1,2,⋯,N, first determine the minimum illumination increment Δα1 and the maximum illumination increment ΔαN, and then between Δα1 and ΔαN according to the formula (11) Evenly generate N illumination increments:

picture

 (11)    

When L<0.3, only images with increased illumination components are generated, that is, the minimum illumination increment Δα1=0. Calculate the maximum illumination increment ΔαN according to formula (12):

picture

 (12)    

in,

picture

   (13)    

in,

picture

Represent the pixel values ​​of the R, G, and B channels of the original pixel after the illumination component is increased by ΔαN.

Since the greater the illumination increment between the generated image and the original image, the greater the error between the generated image and the real image with different exposures, so when ΔαN>1.2, set ΔαN=1.2.

When L>0.3, both images with increased illumination components and images with decreased illumination components are generated. ΔαN is obtained in the same way as when L<0.3. Calculate Δα1 according to formula (14):

picture

(14)    

in,

picture

 (15)    

in

picture

Represent the pixel values ​​of the R, G, and B channels of the original pixel after the illumination component is increased by Δα1. When Δα1<−0.5, let Δα1=−0.5.

In this paper, N=5 is set, and Figure 5 shows a set of self-adaptively generated multi-exposure images. It can be seen from the figure that the generated figure 5 has been significantly enhanced on the sculpture in the low-light area, but overexposed in the high-light area (ie, the scene outside the window); while in the generated figure 3, the high-light area has been properly exposure, but low-light areas are underexposed. The above phenomenon shows that different exposure images generated by simulating natural shooting cannot properly display the image information of high-light areas and low-light areas at the same time.

picture

Figure 5 A set of adaptively generated multi-exposure images

2.4    Multi-exposure image fusion

In order to obtain an enhanced image with a larger dynamic range, this paper adopts the multi-scale exposure fusion method [14] to fuse the multi-exposure images to generate the final enhanced image. Unlike this method, which uses a set of multi-exposure images obtained by shooting for fusion, this paper uses N different exposure images adaptively generated from a single image for fusion, which realizes the enhancement of a single low-light image, and can effectively avoid shooting Artifact issues caused by dynamic scenes or camera shake. The specific algorithm is as follows:

(1) According to the original image, generate N multi-exposure images Qk (k=1,2,⋯,N ) through the method in Section 2.3, and set N=5 in this paper.

(2) Convert Qk (k=1,2,⋯,N ) from RGB color space to YUV color space to get Ik (k=1,2,⋯,N ).

(3) Calculate the weighted image of the multi-exposure image according to formula (16):

picture

 (16)    

Among them, i, j represent the pixel position, C, S, E and B represent the weights calculated according to the contrast, saturation, exposure time and image brightness respectively. For the specific calculation method, refer to [14]. In order to make the sum of weights equal to 1, Wij,k is normalized to get

picture

(4) For Ik and

picture

Establish Laplacian pyramid and Gaussian pyramid respectively to get

picture

, n represents the number of layers of the pyramid, the same as [14], n=⌊log2min(h,w)⌋−2 , h and w represent the number of rows and columns of the image, respectively, and ⌊⋅⌋ is the rounding down symbol .

(5) Fusion the 1 to n−1 layer pyramid according to formula (17):

picture

 (17)    

(6) Fusion the nth layer pyramid according to formula (18):

picture

 (18)    

picture

Represents a Gaussian filter for smoothing

picture

, when k=1,2, let β=1.5; when k=3,4,5, let β=0.

(7) The final low-light image enhancement result is obtained by the following formula:

picture

   (19)    

3 Experimental results and analysis

3.1   Parameter setting

In order to determine the relationship between the number N of multi-exposure images generated and the performance of the algorithm, the algorithm of this paper was tested on 500 images of the LOL dataset [11], and the enhancement results when N was taken with different values ​​using three evaluation indicators Carry out the evaluation, as shown in Figure 6.

picture

Figure 6 The choice of parameter N and the evaluation results on the LOL dataset

Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM) and Feature Similarity (FSIM) [19] are three full-reference evaluation indicators, and the larger the value, the better the enhancement effect. It can be seen from Figure 6 that the indicator PSNR first increases with the increase of N, reaches the maximum when N=4, and then decreases with the increase of N. This is because the input information is insufficient when N is too small, and When N is too large, the noise in the fusion result will be amplified, and the indicator PSNR will decrease instead; the indicator SSIM increases with the increase of N, because when the input image increases, the information provided also increases, and the fusion result The structure of is also clearer; the indicator FSIM is lower when N=2, and the value of N has little effect on the indicator FSIM when N>2. Considering these three indicators comprehensively, N is set to 5 in this paper.

3.2    Comparative experiments on public datasets

This section compares the method in this paper with five representative low-light enhancement methods on the test sets of two public data sets from the aspects of subjective restoration effect and objective evaluation indicators. Among them, the NPE method [7], the LIME method [8], and the JED method [9] are traditional methods, and the RetinexNet method [11] and ZeroDCE++ method [13] are methods based on deep learning. The test set of the LOL dataset [11] contains 15 groups of images, in which the low-light images and reference images are all captured by the camera. The test set of the MIT data set [20] contains 500 sets of images, in which the low-light images are captured by the camera, and the reference images are manually adjusted by 5 photographers (A/B/C/D/E) using software. The adjustment result of photographer C is used as a reference image, and the image is converted into a PNG format image with a size of 400×400 pixels for testing. Figure 7 shows the low-light image enhancement results on the LOL test set [11], and Figures 8 and 9 show the outdoor and indoor low-light image enhancement results on the MIT test set [20], respectively. It can be seen from Figure 7 that the enhancement result of the method in this paper is the closest to the reference image, indicating that the enhancement result of the method in this paper is the closest to the real captured image. The NPE method [7] successfully improves the brightness of the image, and enhances the saturation of the image, making the color of the image more vivid, but in some cases there will be a problem of color distortion, such as in the restoration result of Figure 9 (a). The skin on the upper part is red, and the black clothes are whitish in the recovery result of Figure 9(b). The LIME method [8] has also successfully improved the brightness of the image, and the contrast of the image has been enhanced, but there is a problem of over-enhancement in local high-light areas. For example, in the restoration result of Figure 9(e), the face is obtained due to over-enhancement Enhanced results of overexposure. In the enhancement result of JED method [9], the brightness of the image is slightly lower than that of other methods, and the denoising algorithm leads to over-smoothing of the restored image and the loss of detailed texture information. The enhancement result of the RetinexNet method [11] highlights the structural information of the image very well, but the style of the enhanced image is quite different from that of the real shot image, and there is an unnatural problem. The ZeroDCE++ method [13] also has the problem of the style change of the enhanced result, as shown in Figure 8(b) and Figure 8 9, the enhancement results of the ZeroDCE++ method [13] are whitish. Compared with the above methods, the enhanced results of this method can better maintain the original color, and are closer to the color of the real captured image, and can successfully display the image information of high-light areas and low-light areas in the enhanced results at the same time. There is no over-enhancement of highlighted areas resulting in overexposure.

picture

Figure 7 Enhancement results of low-light images on the LOL test dataset [11]   

picture

Figure 8 Enhancement results of outdoor low-light images on the MIT test dataset [20] 

picture

Figure 9 Enhancement results of indoor low-light images on the MIT test dataset [20]

In order to quantitatively evaluate the enhancement effect of the method in this paper, the full reference evaluation indicators PSNR, SSIM and FSIM [19] are used to evaluate the enhancement results of each method. The larger the PSNR, the closer the enhancement result is to the reference image in terms of pixel values; the larger the SSIM, the closer the enhancement result is to the reference image in structure; the larger the FSIM, the closer the enhancement result is to the reference image in terms of features.

Table 1 presents the quantitative evaluation results of different methods on the LOL test set [11], and Table 2 presents the quantitative evaluation results of different methods on the MIT test set [20]. Among them, the optimal index value is represented in red, and the suboptimal index value is represented in blue. It can be seen from Table 1 that the method in this paper is the best in the three evaluation indicators on the LOL test set [11], that is, the method in this paper obtains the enhancement result closest to the real captured image. It can be seen from Table 2 that the method in this paper achieves optimal results in SSIM and FSIM on the MIT test set [20], and PSNR achieves suboptimal results, which shows that the enhancement results of the method in this paper are different from those of manually adjusted reference images in terms of structure and characteristics. on the closest.

picture

Table 1 Quantitative evaluation results on the LOL test set [11]

picture

Table 2 Quantitative evaluation results on the MIT test set [20]

3.3    Experimental analysis of multi-exposure fusion

This section first compares the performance of fusion enhancement between the multi-exposure image generated in this paper and the multi-exposure image taken by a fixed camera, and the effect is shown in Figure 10. Among them, Figure 10(a)(b) is the image with different exposure time taken after fixing the camera, Figure 10(c) is the enhancement result obtained by using Figure 10(a) as the input image by the method in this paper, and Figure 10(d) is Figure 10(a)(b) is used as the result of fusion of input images. It can be seen from the figure that the contrast of the enhancement result of this method in Figure 10(c) is better than that of the fusion result of real-shot multi-exposure images in Figure 10(d), and it can be seen from the enlarged area of ​​the red frame that the clarity of the results of the method in this paper is to be higher. As can be seen in the yellow box in Figure 10(d), artifacts are produced in the fusion due to capturing a dynamic scene (motorcyclist), but this problem does not exist in the results of our method.

picture
Figure 10 Comparison of the method in this paper and the fusion results of real-shot multi-exposure images

Secondly, the image enhancement effect was compared between the multi-exposure image fusion result generated in this paper and the single increased exposure result, as shown in Figure 11. In the experimental results of only increasing the exposure, the parts with low light in the original image (such as buildings) are significantly enhanced, but the parts with high light in the original image (such as the sky and lights) are lost due to overexposure. The information of the image, in the result of multi-exposure fusion, successfully preserves the information of the low-light and high-light parts at the same time, indicating that the method in this paper can effectively improve the brightness of the image and obtain an enhanced result with a larger dynamic range.

picture
Figure 11 Comparison of multi-exposure fusion results and increased exposure results 

4 Conclusion

A low-light image enhancement method based on multi-exposure image generation is proposed. Based on the physical mechanism, the method generates multi-exposure images based on a single low-illumination image, and achieves low-illumination enhancement of a single image, and its effect is better than some existing single low-illumination image enhancement methods and multi-exposure image fusion methods . The method of enhancing based on a single low-light image proposed in this paper can effectively avoid the generation of artifacts, and has better versatility than the method of using multiple exposure images for fusion.

Disclaimer: The articles and pictures reproduced on the official account are for non-commercial educational and scientific research purposes for your reference and discussion, and do not mean to support their views or confirm the authenticity of their content. The copyright belongs to the original author. If the reprinted manuscript involves copyright and other issues, please contact us immediately to delete it.

 "Artificial Intelligence Technology and Consulting" released

Guess you like

Origin blog.csdn.net/renhongxia1/article/details/131909871