Learning to See in the Dark papers read notes

This is an image enhancement paper, the authors created a data set, and the previous question different, see the author's creation in the dark (SID) data collection is shooting in extremely dim light, this can serve as a point great contribution
in fact I think the author is actually doing the work of three, as well as to the mosaic image (demosaic), image enhancement (enhancement) and image de-noising (denoise)

denoise

In the introduction the authors reviewed previous work, including image de-noising, image mosaics to work, as well as image enhancement, image de-noising, the author has a deep learning related methods mentioned before, and also said that some work is an image demosaic and denoise the same time to do, they only flaw is conducted experiments on synthetic geometry data, and no experiments on real data sets

low-light image enhancement

In reviewing enhancement authors say, histogram equalization is a more classic method for image enhancement. Another method is more classic gamma correction, gamma correction can be compensated for the dark region, to suppress the relatively bright area
there are other methods for image enhancement, such as dark channel inverse transform, retinal model, and FIG luminance estimation
but now Some image enhancement method does not explicitly establish a model of image noise, and just use some ready-made noise removal algorithm in the post-processing phase (see image enhancement own paper too little, give yourself Wage Keng)

noise image dataset

The authors say, SID is the first low-light data set has a built gt, therefore, contribution is still getting bigger

The establishment of data collection

As shown below,

There are two imaging array, is a bayer array, one array X-trans, and a total of 5094 images
of each when shooting a dark picture, both h after long exposure to get a brighter picture by this constitutes pictures of
authors added that long exposure to get the picture actually contains noise, our target is to get another job perceptual quality enough lines, rather than high-contrast images
in fact, I think the image obtained in this way acquisition, certainly contains a lot of noise, but for this task, it should not matter, landing scene of this task is the feeling of dark days want to have photographed image of the night, this feeling is the perceptual quality, in fact, really It does not necessarily require that the sight, but the feeling
to the size of the two cameras are not the same, and quite large, a 6000 4000, one 4240 2832

method

Speaking before the sitting of the method, the authors reviewed the previous method is that, one is the traditional method, a method is L3, there is the method Burst

traditional method is to have a series of step, such as white balance first, then demosaicing, after denoise, sharpening, color space conversion, gamma correction and some other things, and finally get the output image, this method is probably different camera has its own set of processing algorithms, isp kind of
L3 algorithm is different, L3 algorithm is in a large collection of local, linear and learned filters to approximate complex nonlinear piplines
but the authors say, whether traditional or L3 pipline pipline, can not handle fast low-light images to you for them can not handle large peak signal to noise ratio
in addition to the above said two methods, there is a burst imageing pipline, this method is commonly used in smart phones, although this approach by aligning and blending multiple images, but this the process is very complex, such as intensive match, you may not be able to obtain a video Column, or will use lucky imageing

Method authors, nothing innovative from the network structure, the below

input a raw bayer array diagram of the pixels are rearranged to obtain the data channel 4,
on rearrangement, bayer Arrays and X-Trans array are not the same , as shown

bayer with 4 Blocks rearrangement 4, X-Trans 6 with Blocks rearrangement 6, Channel 9 is exchanged because of the adjacent element (not understand)
the output image are the size of half 12channel image

因为处理的是全分辨率的图像(后面训练的过程貌似crop了),所以需要的网络不能太大,作者这里用了两种网络,一种是CAN,另外一种是Unet

训练

在训练的过程中,作者用了L1 loss,和adam optimizer,
输入的是short-exposed image,输出的是long-exposure的图片,这里的gt,作者用libraw已经处理过,色彩空间是sRGB空间
因为两张图片的尺寸不一样,作者在这里用了两个网络去训练,注意到在训练的时候有一个amplification ratio,作者将这个设置为输入和输出之间的曝光时间比例
同时,对于图像随机crop512*512的尺寸,以及做了一个随机的flip以及rotation
lr一开始设置为1e-4之后减为1e-5
训练4000epoch

实验结果

作者首先展现了一些质量评估的结果,

可以看出,traditional method不能handlenoise以及color bias
所以传统的图像增强方法(如果不对噪声进行建模)的话,produce的图像噪声会非常的严重,因此作者采用了
一种图像去噪的方法BM3D,BM3D是一种非blind的去噪方法,需要指定noise-level的等级,如果指定太小的噪声的话,那么可能会去不干净,如果指定太大的噪声的话,可能会over-smooth,如下图所示

两种噪声是同时存在的,所以BM3D并不能locally adapt to the data,相反,作者的方法显得整体比较和谐

作者还说,如果和BM3D以及burst processing方法比较PSNR/SSIM是不公平的,因为,这些baseline不得不需要一些处理,为了更加公平的比较不同的方法,作者对于这一部分陈述的比较复杂,如何是一个公平的比较法呢,作者用reference image的白平衡参数来减少色彩的偏差(应该是不同方法输出的图片),同时,他们逐个channel的scale图像使得不同方法输出的图像和reference image有相同的mean values

尽管这样,作者并没有用两个指标来评价校正图像的好坏,而是用A/B test
结果如下

在比较难得Sony x300 set上,作者的结果碾压BM3D,在比较简单的Sony x100 set数据集合上,作者的结果与之相当,
作者自己拍摄了一些图片,并且进行了测试,也能够得到良好的实验结果

ablation study

作者的ablation study实验结果如下,这个时候作者用了PSNR和SSIM两个指标

Unet换成CAN的话,略有下降,但是Fuji上升了不少在ssim上升了不少
同时,如果输入的色彩空间是sRGB,会香江很多店,感觉Sony对于色彩空间更加敏感,而Fuji并不是,如果将L1换成SSIMloss的话,指标会波动一些
换成L2loss的话,也差不多把
作者还对比了不同的data arrangement对于实验结果的影响,另外一种数据arrangement不是特别懂,这里就忽略了
下面一个是作者对比了一下,如果将gt进行直方图均衡化,看看会得到什么结果
如果将gt进行直方图均衡化的话,实际上是让网络学习到一种直方图均衡化的能力,实际上作者发现,好像并没有让网络学习到这种东西,而且点掉的非常厉害,因此作者通过分析实验得到的结论是,不用把直方图均衡化纳入到网络的pipline中,可以作为一个postprocess的过程

结论

在结果中,作者讨论了很多,快速低光图像增强非常具有挑战性,因为其含有很少的光子,以及很低的snr,
作者提出的方法的limitation是,必须要手动输入amplification ratio,选取一个amplification ratio是非常的有用的,同时,作者说他们的不同的ccd有不同的网络,实际上通用性不强,我感觉这个如果以后有用的话,也应该是针对不同的相机,所以问题不大
另外作者说网络跑的比较慢,基本上需要0.38-0.66s,来处理一张full-resolution的图片
以及,作者说他希望未来的工作可以集中在图像质量的改善上,以及完善和集成训练步骤

后续工作

搜了一下谷歌学术的引用,感觉引这篇文章的人并不多,而且在此工作上做的我好想一篇都没看到
相反,github上star非常多,不说了,我先下载代码跑跑,看看有没有灵感

Guess you like

Origin www.cnblogs.com/yongjieShi/p/12372729.html