LED~~

This method does not need to calibrate noise parameters and repeated training, and only needs a small amount of paired data and fast fine-tuning to adapt to the target camera. New pipeline for RAW image denoising in extremely dark scenes

Homepage:https://srameo.github.io/projects/led-iccv23

Paper:https://arxiv.org/abs/2308.03448

Github:https://github.com/Srameo/LED

Google Drive:https://drive.google.com/drive/folders/11MYkjzbPIZ7mJbu9vrgaVC-OwGcOFKsM%3Fusp%3Dsharing

Calibration-based methods dominate the denoising of RAW images in extremely low-light environments. However, these approaches suffer from several major drawbacks:

  1. The noise parameter calibration process is laborious and time-consuming,

  2. The noise reduction networks of different cameras are difficult to convert to each other,

  3. The difference between synthetic noise and real noise is amplified by a high digital gain.

To overcome the above shortcomings, we propose a calibration-free pipeline to illuminate  Lighing E very  Darkness ( LED ), regardless of the digital gain or the  kind of camera sensor. Our method does not need to calibrate noise parameters and repeated training, but only needs a small amount of paired data and fast fine-tuning to adapt to the target camera. Furthermore, simple structural changes can narrow the domain gap between synthetic and real noise without any additional computational cost . With only a total of _6 pairs of paired data, and 0.5% of iterations and 0.2% of training time_ on SID[1], LED exhibits SOTA performance!

Training with real paired data

SID[1] first proposed a complete set of benchmarks and datasets for low-light enhancement or denoising of RAW images. Why start with RAW images for denoising and low light enhancement? Because it has a higher upper limit, please refer to the article of SID[1] for details.

So what exactly does it do? It's very simple, as shown on the left in Figure 1, use a camera to capture a large amount of paired real data, and then directly pile it into the network for training. Figure 1: Training process based on real paired data (left) and training process based on noise model calibration (right)

But there is a very important problem, different sensors, noise models and parameters are different. So according to this process, do we need to re-collect a large amount of data and re-train for each camera? Is it a bit too cumbersome?

Algorithm flow based on noise model calibration

Regarding the above-mentioned problems, the recent paper[2] [3] [4] [5] uniformly tells us: yes. Now, everyone mainly rolls, including in various industrial scenarios (mobile phones, edge devices), denoising tasks have begun to use calibration-based methods.

So what is calibration? For the specific calibration process, you can refer to @Wang Hawk's article Noise Model and Calibration for Digital Camera Imaging (https://zhuanlan.zhihu.com/p/397873289). Of course, the algorithm based on deep learning + noise model calibration is roughly divided into the following three steps (please refer to the right of Figure 1):

  1. Design the noise model, collect calibration data,

  2. Use the calibrated data in 1. to estimate the parameters of the noise model, and the gain (or iso) and noise variance have a linear relationship in the log domain.

  3. Use the noise model calibrated in 2. to synthesize paired data and train the neural network.

In this way, for different cameras, we only need to use different calibration data (the difficulty of collecting is much less than that of large-scale paired data sets), and then we can train a dedicated denoising network corresponding to the camera.

But is the calibration algorithm really good?

Calibration Flaws and LED Figure 2: Algorithm Flaws Based on Noise Model Calibration

So what do we like?

  1. Simplified calibration [6] [7], or even no calibration ,

  2. Quickly deploy to new cameras,

  3. Strong "generalization" ability: very good generalization to real scenes , overcoming the domain gap between synthetic noise and real noise.

So here comes LEDs! Figure 3: Calibration algorithm compared to LEDs

  1. No need for calibration : Compared with the calibration algorithm that needs to use the noise parameters of the real camera, we use the noise parameters of the virtual camera for data synthesis.

  2. Rapid deployment : Using the Pretrain-Finetune training strategy, only a small amount of data is needed to fine-tune some parameters of the network for the new camera,

  3. Overcoming Domain Gap : Finetune is performed on a small amount of real data to obtain the ability to remove real noise.

What can LEDs do? 

 Figure 4: 6 pairs of data + 1.5K iteration = SOTA Performance! Figure 5: More intuitive log comparison

Method

LED is roughly divided into the following steps:

  1. The predefined noise model Φ\Phi\Phi randomly collects NNN groups of "virtual camera" noise parameters from the parameter space,

  2. Use the NNN group "virtual camera" noise parameters in 1. to synthesize and pretrain the neural network,

  3. Use the target camera to collect a small amount of paired data,

  4. Using the small amount of data in 3. Finetune 2. The pre-trained neural network.

Of course, an ordinary UNet cannot fulfill the three "needs" we mentioned earlier. Therefore, a small change is required in the structure: Figure 6: RepNR Block in the UNet structure

First, we pair all Conv3 in UNet with a set of CSACSACSA (Camera Specific Alignment). CSACSACSA refers to a simple channel-wise weight and a set of channel-wise bias for alignment on feature space.

During pretraining, for the data synthesized by the kkkth camera, we only train the kkkth CSACSACSA and Conv3.

When finetune, first average the CSACSACSA group during pretrain to get the initialized CSATCSA^TCSA^T (for target camera), and then train it to converge first; then add an additional branch to continue fine-tuning, and the additional branch is used for learning synthesis Domain Gap between noise and real noise.

Of course, since CSACSACSA and convolution are both linear operations, we can reparameterize them all together at deployment time, so we don't end up introducing any extra computation ! Figure 7: Reparameterization

Visual Result

The figure below shows the LED's ability to remove Out-of-model noise (the ability to overcome the Domain Gap between synthetic noise and real noise).

Out-Of-Model Noise refers to the noise that is not predefined in the noise model, such as the noise caused by the lens in Figure 8 or the noise caused by Dark Shading in Figure 9 Figure 8: Out-Of-Model Pattern removal ability (Artifact caused by lens)  Figure 9: Out-of-model noise removal ability (Dark Shading)

Discussion on "Why do you need two pairs of data?" Figure 10

I don’t know if you remember a foreshadowing that was buried before: there is a log-linear relationship between the gain and the noise variance.

What does a linear relationship mean? Two points determine a straight line ! That is to say, two pairs of data (each pair of data can provide the value of noise variance at a certain gain) can determine this linear relationship. However , due to the error [2], we need two pairs of data with a gain gap as large as possible to complete the network's learning of the linear relationship. whaosoft  aiot  http://143ai.com 

It can also be seen from the right of Figure 10 that when we use 1, 2, and 4 pairs of data with the same gain, there is not much difference in performance. However, when two pairs of data with a large difference in gain are used (a large difference refers to ISO<500 and ISO>5000), the performance is greatly improved. This also validates our assumption that two pairs of data can learn a linear relationship.

Training and testing, including the reproduction code for ELD, have been open sourced to Github (https://github.com/Srameo/LED) 

Of course, not only the code, but we also open sourced the paired data, ELD, PG (Poisson-Gaussian) noise model on multiple cameras, different training strategies, and different stages (referring to the Pre-train and Fine-tune stages of LED) A total of 15 models , see pretrained-model.md for details

https://github.com/Srameo/LED/blob/main/docs/pretrained-models.md%23network-for-benchmark

In addition, since the RepNR block is currently only tested on UNet, we believe in its potential on other models. Therefore, the code for quickly using the RepNR block on other models is provided. You can use the RepNR block on your own network structure with only one line of code . With our Noisy-Clean generator, you can quickly verify that the RepNR block is in other structural validity. Related explanations and codes can be found in develop.md (https://github.com/Srameo/LED/blob/main/docs/develop.md).

Github:

https://github.com/Srameo/LEDgithub.com/Srameo/LED

Homepage:

https://srameo.github.io/projects/led-iccv23/

Paper:

http://arxiv.org/abs/2308.03448

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/132200001