Top Issue TPAMI 2023! Enhanced learnability for low-light denoising from the perspective of data

Click the card below to follow the " CVer " official account

AI/CV heavy dry goods, delivered in the first time

Click to enter —> [Target Detection and Transformer] Exchange Group

Author: Mo Tianming (Source: Zhihu, Authorized) | Editor: CVer Official Account

https://zhuanlan.zhihu.com/p/651674070

This article is a sharing report of our work "Learnability Enhancement for Low-Light Raw Image Denoising: A Data Perspective" that has just been accepted by TPAMI.

Reply in the background of the CVer WeChat public account: PMN, you can download the pdf and code of this paper

Home page: fenghansen.github.io/publication/PMN/

Paper: ieeexplore.ieee.org/document/10207751

Code link (open source):

https://github.com/megvii-research/PMN/tree/TPAMI

foreword

This work is an expanded version of the work PMN [1] that won the Best Paper Runner-Up Award in our ACMMM 2022 . In the conference edition, we have proposed the idea of ​​learning enhancement, and introduced in detail two specific solutions to implement this idea - Shot Noise Augmentation (SNA) to increase the amount of data and Dark Shading to reduce the complexity of data Correction (DSC).

The increment of the periodical version mainly consists of the following parts:

  1. Facing the data quality problems existing in the existing data sets, we proposed a new data collection process and collected a set of high-quality mobile phone low-light denoising data sets, trying to avoid the inherent learning defects of the data.

  2. We deeply analyzed the flaws in the application of SNA, and improved the application strategy of SNA based on the idea of ​​SFRN black map. The SNA that makes up for the defect can make the image after noise reduction present more details.

  3. We extended the linear dark shading model and deeply analyzed the robustness and generalization of DSC. Based on the noise calibration data we provide, we explore and discuss how DSC can be combined with noise modeling, and the huge performance gains that this combination brings.

I wrote an article last year to introduce the details of the content of the conference version of PMN:

https://zhuanlan.zhihu.com/p/544592330

In this article, I hope to share the thinking behind this article from a more macro perspective, explain the importance of expanding the increment of this article with a deeper analysis, and make this wave of throwing bricks and attracting jade more valuable, so this article may dilute some conference editions Details of the method described in . If you are more interested in the details of the method part (SNA, DSC), you are welcome to read the interpretation articles of the mobile conference version~~

[Reminder]
The PMN series work is more suitable for the industry in nature, and it is a kind of work that can do the upper limit on the premise of getting the Raw data collected by the camera. If you can't even get the permission to collect raw from the camera, then PMN probably won't work. Our goal is to achieve the ultimate in data usage efficiency, so that data can be used reasonably and efficiently to build a denoising network.

The fit crisis that lurks beneath the data

Thanks to the rapid improvement of AI computing power, learning denoising algorithms have become the mainstream choice for non-extremely low computing power devices. Learning denoising algorithms is essentially learning the mapping relationship between real data, so data is crucial. Learnability refers to the difficulty of data mapping being approximated by neural networks. Enhancing the learnability of data mapping is one of the most effective ways to improve denoising performance. However, most studies on image denoising focus on customizing complex neural networks to fit data maps, while ignoring the problem of learnability of data maps themselves, i.e., the problem of data.

The real paired data theoretically reflects the most realistic denoising mapping relationship, which is theoretically the method of making an upper limit. However, the learnability of data mapping between real paired data is often seriously insufficient, which leads to the fact that this "upper limit" is often not true. From a data perspective, the learnability of data mapping in image denoising largely depends on the complexity of the noise distribution, the data volume of the paired data, and the quality of the paired real data, which corresponds to the current real paired data Existing problems:

  1. The amount of data paired with real data is too small, and the data mapping is difficult to learn accurately, that is, the accuracy is limited.

    Due to the limitations of the physical environment, there is no way to collect real shots indefinitely. Over time, it is easy to have various brightness and space dislocations due to environmental changes. This is unavoidable when making data sets.

  2. The complexity of the camera's real noise is too high, and the data mapping is difficult to learn, that is, the accuracy is insufficient.

    In essence, the globally inconsistent FPN affects the performance of nn denoising, and the nn with a limited receptive field cannot efficiently solve this globally inconsistent FPN. (It is not suitable to use Transformer on the whole image of 4000*3000...the same problem will occur if it is cut into pieces)

  3. The quality of the real data needs to be improved, and the data mapping is difficult to accurately represent the real denoising mapping, that is, the reliability is insufficient.

    The data set has several key evaluation indicators (data volume, diversity, and pairing). Data collection under low light is difficult, and it often loses sight of the other. In particular, pairing is often sacrificed, resulting in different distributions between the training data and the actual scene. Yes, what nn learns is not a pure denoising map at all.

This work is dedicated to addressing the data problem to enable any neural network to learn precise, accurate, and reliable denoising maps from doomedly limited pairs of real data.

outside_default.png

Enhanced learnability on existing data

The Dilemma of Pure Noise Modeling

Methods in the noise modeling class attempt to bypass real paired data to synthesize data. For synthetic data backed by a noisy model, a set of data samples is readily available as long as it is sampled once from the model distribution, and the amount of data is basically infinite. Data quality isn't an issue either, it's hard to find data pairs for specific cameras, but clean, high-quality images are plentiful online. Noise modeling seems to be the perfect solution for that. However, the noise modeling scheme often falls on the road of "can't ask for it" in actual combat.

From the perspective of noise, the construction of each sensor is unique.
The noise model (especially the read noise model) is closely related to the circuit design of the sensor, which is a very complicated matter in itself. Sensor manufacturers generally do not disclose their circuit designs, which makes us only treat the sensor as a black box when doing algorithms.
The most embarrassing thing is that sensor manufacturers may design some functional circuits by themselves, and switch the circuit mode to improve the image quality under some conditions unknown to those who make the algorithm, resulting in sudden changes in the noise model. Manufacturers are right to do this. After all, the quality of Raw images has indeed improved, but this kind of external black-box operation will indeed bring a lot of trouble to subsequent noise modeling. (The noise modeling dilemma of new CFAs such as Quad Bayer and RGBW is similar)

From the perspective of modeling, different noise modeling approaches have their own defects.
The idea of ​​physical noise modeling is based on the physical characteristics and statistical distribution modeling of the sensor, that is, to capture the main features that can be resolved to model, which is represented as ELD [2 ] . This idea is naturally good, but extremely dark is a magnifying glass, and the noise modeling of ELD is still difficult to accurately cover the complex real noise model. In the absence of additional sensor information, physical noise modeling always seems to be powerless in the face of complex real noise that cannot be fully resolved.
The idea of ​​learning noise-like modeling is to learn the noise model from the data. The existing learning-like noise modeling is generally fitting the mapping from clean (+noise) to noise (or noisy). It is hoped that the power of nn fitting data mapping can be used to complete the unfinished business of physical noise modeling. However, there are actually some problems that are easily overlooked. It is well known that the success of deep learning relies on large amounts of data. If we have a lot of clean-to-noise (noisy) paired real data, is there really such an urgent need for noise modeling? If the amount of data is insufficient, can we really learn complex noise models? Even if the amount of data is sufficient, can the existing methods ensure that the noise model is accurately learned instead of overfitting the data set? Learning-like noise modeling work generally avoids the problem of data, and it is often empowered by a large amount of paired real data-the thing that noise modeling intends to replace.

All in all, there is still a significant gap between the synthetic noise based on pure noise modeling and the real noise, making the model trained on synthetic data perform poorly in extremely dark scenes. Therefore, pure noise modeling cannot bypass the learnability bottleneck for the time being.

When Pairing Real Data Meets Noise Modeling

It is generally believed that there is a competitive relationship between synthetic data based on noise modeling and real data based on camera real shots. Even moderate conciliators often see one as a complement to the other, a performance-enhancing tool. However, in fact, the relationship between synthetic data and real data is not either-or. We can completely analyze half of the noise by means of physical noise modeling, and replace the remaining half with real data.

A typical example is SFRN (Sample From the Real Noise) [3] . SFRN uses the signal-related part (shot noise) of physical noise modeling, and the signal-independent read noise is directly replaced by the dark frame of the real shot. This method is simple and practical, but there are also some defects, that is, the diversity of noise is easily insufficient-sampling is too discrete. The original work of SFRN made up part of it (referring to high-bit reconstruction). However, after we enriched the diversity of dark frames, we found that there can be additional improvements (the SFRN reproduced in the TPAMI version is much better than the MM version), which shows that by sampling The idea of ​​replacing read noise does not require low sampling diversity. More importantly, giving up "analytical read noise" will actually lose part of the opportunity to use the noise model to assist noise reduction (referring to the application of DSC).

Our PMN is different from SFRN, and adopts an idea that respects the real noise model more—on the basis of not destroying the noise model, we use noise modeling to transform the real paired data.

outside_default.png

Simplified noise model. We parsed the noise model into four parts, and the paired real data corresponded to the clean image on the far left and the noisy image on the far right.

This paper proposes a learnability enhancement strategy for low-light Raw image denoising. From the perspective of data, we use noise modeling to transform real paired data, so that it can provide better learnable data mapping for neural network learning.

Based on the recognition that photon shot noise can be accurately modeled by a Poisson distribution, we propose Shot Noise Augmentation (SNA) to increase the amount of real paired data. Benefiting from the increased amount of data, data mapping with enhanced learnability can lead to clearer textures in denoised images.
In the journal version, we regard the dark frame as a noisy image, which can be used as an object for SNA augmentation. This part is essentially to incorporate SFRN into the paradigm of learnability enhancement, and use "infinitely dark dark frames" to compensate for the defect that SNA can only "expand in a brighter direction".

Based on the understanding that dark shading is a time-domain stable component of read noise, we propose Dark Shading Correction (DSC) to reduce the complexity of real noise distribution. Benefiting from the reduction of noise complexity, the learnability-enhanced data mapping can lead to more accurate colors in denoised images.
In the journal version, we additionally model the relationship with dark shading exposure time. According to my experimental observations, in visible light cameras, the black level (corresponding to BLE) generally changes with the exposure time, and the FPN is relatively stable.

outside_default.png

Calibrate the FPN with the exposure time t as the independent variable, it will be found that the exposure time t is basically linearly independent of the FPN

fd3afd35c1434b877c9540cf6e092d01.png

Calibrate BLE with the exposure time t as the independent variable, and you will find that the relationship between the exposure time t and BLE is basically linear (Note: it may be piecewise linear due to unknown mutations in the circuit)

Construct data with few learnability defects

The devil is in the data

When doing noise modeling work, there have always been some anomalies that I don't quite understand-why the denoising network trained on paired real data performs so much worse on the ELD dataset than the SID [4] dataset ( 2dB lower than ELD modeling)?
Data analysis is the only way to upgrade the algorithm. Combining the numerical indicators of the experiment, the output results and the original image of the dataset, I found that the problem with the SID dataset is actually not small - the network we actually trained with the paired real data of SID is not a pure denoising network. The SID data set is actually full of various data defects, making its data not so "paired" in the sense of denoising. The problems include but are not limited to:

  • Abnormal dark shading caused by the special design of Sensor

cfde569bfca9050f818c93bba8205f5e.png

This pattern is directly compensated by Sony A7S2 for the offset on the raw image. The larger the aperture, the more obvious it is, and it can also be seen on dark shading without light. Due to the existence of strange patterns, the data mapping is not purely denoising mapping. Paired data can be learned, but noise modeling cannot be learned and should not consider this data set defect.
  • Spatial dislocation caused by unreasonable acquisition settings

5037f3de51e6c3050f60eb7049447180.png

PMN denoising results and GT calculate pixel-wise indicators in the sRGB domain. There is a moving object in the outdoor long exposure GT, the long line in the sky is a bright spot in the short exposure, presumably an airplane
  • Residual noise in high ISO scenes

60bb65285ac6a8bcb111467199424e50.png

The left picture is the GT of ISO-25600 in SID, and the right picture is the result after simple denoising
  • Some ill-proven brightness misalignments

  • ……

In fact, not only the SID data set, but other existing denoising data sets also have some data defects more or less, which leads to the key data quality of the data set in terms of data volume, diversity, and pairing (referring to whether there is misalignment). Dimensionally deficient.

As we said earlier:

There are many difficulties in data collection under low light, and they often lose sight of the other, especially the pairing is often sacrificed, resulting in the training data and the actual scene are not the same distribution, nn learning is not a pure denoising map at all.

In fact, most data sets can solve the problems of data volume and diversity as long as they are willing to work hard. The real difficulty in making datasets lies in matching and how to ensure data volume and diversity under high matching requirements.

Our goal is to make a low-light denoising dataset that does not contain data defects as much as possible, and enhance the learnability of data mapping by directly improving the quality of paired real data.
For this reason, there are two rigorous data set production processes that are worth referring to-ELD and SIDD [5] . The production method of ELD is relatively simple, mainly by adjusting the ISO and long and short exposure to make the data set in the most efficient way. SIDD mainly produces data sets through multi-frame fusion, including a systematic process of removing dead pixels, screening images, correction, alignment, and fusion. This can almost be regarded as a concise multi-frame raw image denoising process.

6c0945f1e6e6d06d9352b27261466bca.png

The production process of the SIDD dataset

SIDD also made a key discovery—even if the camera is fixed on a shockproof optical platform, there will still be misalignment when the number of shooting frames increases, even as high as 2~4 pixels in their experiments! The author believes that this misalignment is caused by the phone's OIS (Optical Image Stabilization) function not being able to be effectively turned off. After we reproduced this phenomenon in multiple locations using multiple devices and multiple devices, we found that this misalignment is not entirely OIS in nature.
Through experiments, the dislocation can be observed to be slightly reduced in a short period of time when the OIS is turned off, but the dislocation under a long time is not affected by the switch. Another stability experiment conducted by Fan Haoqiang faintly pokes at the essence of this phenomenon—this kind of spatial dislocation is caused by "unmeasurable environmental disturbances" and "over-damping friction of the mobile phone clip".

9021c4edcce9c5fa19d0c8a545afd71b.jpeg

Spatial misalignment of mobile phone camera fixed on tripod. The short time corresponds to within 5 seconds, and the long time is about 4*90 seconds

This finding is extremely important for dataset production. This is equivalent to saying that when making a low-light denoising dataset, no matter whether it is indoors or outdoors, the problem of spatial misalignment has to be considered. The accuracy of alignment algorithms is limited, especially for large motions. Therefore, the production of low-light denoising data sets cannot completely rely on post-processing to correct spatial misalignment, but should try to avoid taking pictures for a long time during the collection process, thereby reducing spatial misalignment caused by unpredictable environmental disturbances.

Smartphone Low Light Raw Denoising (LRID) Dataset

The insufficiency of the denoising dataset itself is the culprit that limits the learnability of data mapping. This deficiency is mainly reflected in four aspects: insufficient diversity, noisy GT, brightness misalignment, and spatial misalignment. Insufficient diversity can lead to overfitting of the data map. Noisy GT will make it difficult for data mapping to converge to the optimum. Luminance misalignment can lead to bias in the data mapping. Spatial misalignment can lead to errors in data mapping.
Unfortunately, the existing denoising datasets have obvious deficiencies in one or more aspects of these deficiencies, so the existing datasets are difficult to meet the low-light denoising requirements in real scenes in terms of learnability. Our goal is to collect a set of high-quality datasets to validate the upper bounds of learnability enhancement strategies.

  • data collection stage

We collected a set of high-quality datasets using the Redmi K30 with the IMX686 sensor. The Smartphone Low Light Raw Denoising (LRID) dataset contains 138 sets of scenes. For each scene, we first collected 25 long-exposure images at ISO-100, and then immediately collected 5 groups (10 images/group) of short-exposure images at ISO-6400. We use a program to remotely control the mobile phone to avoid vibration, and the time interval of semi-automatic continuous acquisition is very short (about 0.01s/frame), so the short-exposure images can be regarded as having no spatial misalignment.
The indoor scene data was collected in various enclosed spaces with various color temperatures and light intensities. There are 5 groups of short-exposure images for each scene, and the exposure time ratios of long- and short-exposure images are 64, 128, 256, 512, and 1024, respectively. The total exposure time of the long exposure is about 25 seconds.
Most of the outdoor scene data is taken at midnight, when the weather is calm (wind speed is generally less than 0.5m/s). There are 3 groups of short-exposure images for each scene, and the exposure time ratios of the long- and short-exposure images are 64, 128, and 256, respectively. The total exposure time of the long exposure is about 64 seconds.
A small number of indoor scenes were shot with a light reduction filter, and the camera settings of the outdoor scenes were followed.

84ba78153be3a30d95933fc40fd8502a.png

The above acquisition settings are the image acquisition settings that we compromised after considering data defects. The main considerations are as follows:

  • The total exposure time is limited (<64s) to avoid space/brightness misalignment caused by environmental disturbances/changes (including shaking of high leaves caused by gusts of wind, changes in the lighting of urban residential buildings, changes in the sky starting around 4:20 am in summer, nearby cars through the resulting vibrations, etc...)

  • The minimum exposure time is 10ms to avoid flicker introduced by the ubiquitous AC light source (50Hz)

  • The long exposure of 25 shots per scene is due to the fact that ISO-100 on the phone is obviously noisy, and subsequent multi-frame fusion is required to denoise

e7ad530676ad31f5d939ac86a13fc717.png

Noisy is a single image collected under ISO-6400, SIDD-style is the fusion result of 64 images collected under ISO-6400, and ELD-style is a single image collected under ISO-100. Ours is the fusion result of 25 images collected at ISO-100. (Best viewed with zoom-in)
  • 10 short exposures per group is to consider that noise sampling also requires diversity. It is better for "GT: Noise Map" to be greater than 1:4

0e89901e19c0226fb22e5da07d2dc858.png

Ablation experiment - training the denoising network with different diversity data, the performance is close to saturation under our setting
  • data processing stage

Changing the noise map will destroy the noise model, so a reasonable idea is to make the relatively clean GT to the position of the noise map. Since the exposure time interval is very short under our acquisition settings, the spatial position of the noise image can be approximated as the last frame of the ISO-100 long-exposure image, that is, the last frame is used as the reference frame for GT estimation.

28671e50ebd2ea4f10fb4a6678be1c39.jpeg

Our GT estimation process, the whole process is highly similar to burst/video denoising

This process integrates a lot of my understanding of traditional image processing algorithms. The content and details are relatively complicated, so I won’t repeat them here. For the detailed processing process, please refer to the original paper. Here we only briefly show the general effect of each module in the outdoor scene.

be48d434eb8459ac249e2a4b6b8f61fa.png

This process can robustly deal with the movement of small outdoor objects, and try to avoid the impact of spatial misalignment, brightness misalignment, and noise on GT

Experimental results

The article indicators of the Raw domain noise modeling series are all calculated on the raw domain, and all you see are the results of rawpy visualization.
I put a part of the sRGB domain indicators that students who do low-light image enhancement pay attention to at the back of the PMN interpretation article in the conference version. You can also calculate them yourself based on the public pictures and weights in the network disk.

horizontal comparison

9f2fea884496d4ff5cc572e778f6cb40.png

07a33efc220376f69ba977aa6850d8b6.png

The four lines from top to bottom are: SID dataset, ELD dataset, LRID-Indoor, LRID-Outdoor

Both the ELD dataset and the SID dataset were collected with SonyA7S2, however, the performance of the paired data training results on the two datasets is very different. On the SID dataset, the performance of the model trained with paired real data is close to that of SFRN. However, on the ELD dataset, models trained on paired real data are significantly worse than models trained on synthetic data proposed by ELD.

Based on the experimental results of our method, this observation can illustrate that the data defects caused by unreasonable image acquisition settings in the SID dataset will indeed lead to fragile learnability. Fragile learnability makes the data mapping provided by real paired data inaccurate, which reduces both performance and generalization. Our method can simultaneously achieve state-of-the-art on both datasets and significantly outperform previous methods. This shows that our idea of ​​compensating for data learnability is essential, which improves both performance and generalization.

Ablation experiment

1db50267987552eb8601975ace68037c.png

The module with * means to use the previous version, that is, the method of MM version

bf0fd4f193d82cc9c6b71caa5d90539d.png

The module with * means to use the previous version, that is, the method of MM version

Without any learnable augmentation, a denoising model trained with paired real data looks blurry and noisy. It is very difficult for neural networks to learn precise and accurate data mapping from paired real data with insufficient learnability.

SNA usually does not improve the numerical index greatly, but it can significantly improve the resolution of the denoised image. This improvement in visual quality is mainly due to the precise fitting brought about by the increased amount of data. Compared with the old version of SNA (SNA*), the new version of SNA has more comprehensive enhancements to achieve higher mapping accuracy. But not correcting dark shading may allow nn to overfit to biased data mapping, which sometimes leads to more significant color casts.

DSC usually improves the numerical index greatly, which can significantly improve the color cast caused by dark shading in the denoised image. This improvement in visual quality is mainly due to the accurate fitting brought about by reducing the noise complexity. Compared with the old version of DSC (labeled as DSC*), the new version of DSC has a more accurate dark shading model, which can achieve higher mapping accuracy. But it is worth noting that DSC does not significantly improve the denoising accuracy, which relies more on SNA.

All in all, SNA and DSC complement each other, and the best performance can only be obtained by adopting a complete learning enhancement strategy.

Discussion about DSC

Dark shading with the same sensor and different cameras

b508f0c944ac1a0d2cecd40054adf018.png

This article uses SonyA7S2-4. When we supplemented this experiment, we found that SonyA7S2-2 is the best denoising performance.

Dark shading calibrated with different cameras does lead to different denoising performance, but their numerical indicators are close, still significantly higher than previous work. The experimental results show that the dark shading of different camera calibrations of the same sensor is highly similar, which shows that the DSC proposed by us is feasible under the settings of the paper and has a certain generalization ability.

Dark shading of different sensors

81e22e7debfa75265fa4b3411e3c4f03.png

IMX686 is the sensor used by LRID, OV48c and GN2 are also mobile phone sensors, SC410GS is a surveillance camera sensor

According to our observation, there are not many sensors with obvious dark shading, and they are often found at high ISO and high gain. Dark shading is quite common in academic low-light denoising datasets (such as CVPR22's starlights [6] ). The above observations show that our proposed DSC is necessary and has a wide range of application scenarios.

DSC for extended noise modeling

SNA is essentially a specialized version of shot noise modeling, and the real development of pure noise modeling is DSC. The linear dark shading model can be directly used to retrofit existing noise modeling schemes.

  • For noise modeling methods (such as Poisson-Gaussian models) that have never been considered for time-domain stable noise, DSC can be applied directly in the inference stage without changing the training strategy.

  • For noise modeling methods where BLE has been considered (e.g. RethinkNM [7], ELD), there are two approaches. If the BLE of the sensor is stable, then there is no need to change the training strategy. You can use dark shading without BLE for reasoning. If the BLE of the sensor is not stable, then only temporal noise in noise modeling can be used during training, and DSC is applied during inference.

  • For methods that implicitly include dark shading in noise modeling (such as SFRN, learning class noise modeling), we need to change the training strategy. Only temporal noise in the noise model should be used during training, while DSC is applied during inference.

The situation of SFRN is slightly complicated, and some engineering troubles need to be solved. For details, please refer to the original text. In addition, we were pleasantly surprised to find that the LLD [8] of CVPR23 of Mr. Zuo's group has already used this method, among which DSC is used in LLD*, which significantly improves the performance of learning noise modeling.

c6ca74c77edf12d59a8931e889078ac6.png 98c32c5134a00d63bf4fd66da19180c5.png

Limitation

  • When there is a huge temperature difference between the Dark shading calibration and the actual application, the DSC efficiency may be reduced.

    nn is robust to dark shading differences at normal operating temperatures. The dark shading in the section "dark shading with the same sensor and different cameras" does not strictly control the temperature. The temperature of No. 2 and No. 3 cameras is actually slightly higher than the normal operating temperature. It is a bit hot at the end of the calibration, but the DSC is under PMN. Also covered. If the temperature is too high, the dark shading mode may change irregularly.

  • Switching of the Sensor circuit may cause the dark shading parameters to be inappropriate. It is recommended to re-calibrate the dark shading if the circuit is significantly changed.

    SonyA7S2 is a dual native ISO camera, so we calibrated dark shading with ISO-1600 as the boundary. The IMX686 used by LRID has a temperature control/long exposure control mechanism whose trigger principle is unknown, and sometimes switches circuits suddenly, so we calibrated two sets of dark shading models.

postscript

PMN is really the most suitable scene - split trilogy plan

In fact, these two articles of mine (PMN's MM version and TPAMI version) are the result of a semi-finished project that has been dismantled into three parts—the first two parts of the screen shooting trilogy.

Progress and Time Management in Algorithm Research - Zhihu (2022.01.01) 
... 
The project I am currently preparing to write a thesis submission has completely gone through the above process, and I can share it here for your reference. When my project was established, it was expected to use some kind of electronic equipment as the core, bypass the difficulty of data collection to solve the problem of data collection, and then improve the performance of the task by solving the data problem [PS: take a screen]. This solution introduces new modules to solve old problems, which belongs to the optimization of the image processing system rather than the optimization of the module itself. The biggest uncertainty of this scheme is that the introduced electronic equipment itself is not completely controllable, they have more or less certain errors and defects, and new defects are introduced in the process of bypassing inherent cognitive defects [PS : Screen capture degradation]. In this process, I developed a new flexible data enhancement algorithm [PS: SNA] in order to make up for these shortcomings. At the same time, I found a problem that was ignored by the predecessors in the comparison and solved it [PS: DSC]. After discovering that the defects of the expected solution are always greater than the advantages under normal circumstances, I chose to solve specific problems under specific conditions [PS: low light/extremely dark], and finally got a useful experimental conclusion. However, the specialized scheme cannot be compared with the public data set, and its convincing power is limited and it is easy to be rejected, so I need to do a data set work that meets my needs [PS: LRID], and try to publish this after it is proven feasible Work. At this time, the role of my experimental planning is reflected. At the very beginning of the project, I designed a data collection experiment that seemed to be of limited use at that time, and the data enhancement algorithm developed during the research process and the compensation plan for the problems neglected by the predecessors can be reproduced as the content of the data set work. Useful, I no longer need to rack my brains for innovations outside the data set in the data set work, which speeds up the progress of my experiments very quickly, so that I finally have enough time to polish my first article that has not yet appeared .
...

In fact, the screen capture data set is a plan that RViDeNet [9] wanted to do after seeing the stop-motion animation data set in January 21. Later, due to various reasons, the research was started in August 21. After experimenting for about half a year, I came up with the basic solutions of SNA (data enhancement), DSC (noise model correction) and LRID (data set) in the process of solving the defects of the screen capture solution. They were originally the product of a project, so the coherence is relatively strong. My director has strict control over the quality of the papers. "Hodgepodge" is not acceptable, and I am reluctant to lose every important component, so I have to take a gamble and disassemble the content of the screen shooting project and submit it separately. Here I sincerely thank my guide for the logical support and threading, otherwise I would not be able to make up the disassembled points to complete a whole.

As for the originally planned third part "Screen Shooting", it probably won't come out. After all, the core innovation point has been removed by PMN, and the screen shot data set as a form carrier has also been discovered by other peers within this year. (ReCRVD [10] , RMD [11] ). Although I think our unique experience in data acquisition and dealing with screen degradation is still an advantage, this engineering skill is really not good enough to publish a paper alone.

Let's talk about learning noise-like modeling - our next work

When discussing "The Dilemma of Pure Noise Modeling" earlier, we raised such a set of questions:

If we have a lot of clean-to-noise (noisy) paired real data, is there really such an urgent need for noise modeling?
If the amount of data is insufficient, can we really learn complex noise models?
Even if the amount of data is sufficient, can the existing methods ensure that the noise model is accurately learned instead of overfitting the data set?

These questions are raised because we have found that existing learning-like noise modeling is generally dragged down by data and has various learnability defects

  • Underfitting to Noisy Model

    It appears that the noise model of synthetic data is usually different from that of real data (inconsistent in intensity1 and inconsistent in mode2), which leads to inaccurate denoising results

  • Overfitting to the training scene

    It is manifested that the neural network usually learns the data bias (spatial misalignment 3 and brightness misalignment 4) caused by the imperfect data acquisition process, which leads to inaccurate denoising results

2742eae66178f33b7a93dd88e95b6f93.png

CAGAN (ECCV 2020). The significant edge residual in Noise should be the spatial misalignment between SIDD non-reference frame noise image and GT, not noise

184b4c6084b7e80b86b23662fe4c50c0.png

Starlights (CVPR 2022). The network seems to have learned some kind of brightness misalignment in the training data set

Ehe? Have you ever smelled "enhanced learnability"? Yes, this is also the methodology of our next work.

The new work and the incremental part of the journal version of PMN have the same story background, but this time we have changed a technical route - directly transforming learning-like noise modeling. In our new work, we will face the problems of data quality and distribution measurement, and break the "pure noise modeling dilemma" mentioned above one by one! This pure noise modeling work has been submitted, and the effect is more distinctive than PMN, so stay tuned~~

Reply in the background of the CVer WeChat public account: PMN, you can download the pdf and code of this paper

reference

  1. ^[1] https://fenghansen.github.io/publication/PMN/

  2. ^[2] https://ieeexplore.ieee.org/abstract/document/9511233/

  3. ^[3] https://openaccess.thecvf.com/content/ICCV2021/papers/Zhang_Rethinking_Noise_Synthesis_and_Modeling_in_Raw_Denoising_ICCV_2021_paper.pdf

  4. ^[4] https://openaccess.thecvf.com/content_cvpr_2018/html/Chen_Learning_to_See_CVPR_2018_paper.html

  5. ^[5] https://openaccess.thecvf.com/content_cvpr_2018/html/Abdelhamed_A_High-Quality_Denoising_CVPR_2018_paper.html

  6. ^[6] https://openaccess.thecvf.com/content/CVPR2022/html/Monakhova_Dancing_Under_the_Stars_Video_Denoising_in_Starlight_CVPR_2022_paper.html?ref=https://githubhelp.com

  7. ^[6] https://ieeexplore.ieee.org/document/9428259

  8. ^[7] http://openaccess.thecvf.com/content/CVPR2023/html/Cao_Physics-Guided_ISO-Dependent_Sensor_Noise_Modeling_for_Extreme_Low-Light_Photography_CVPR_2023_paper.html

  9. ^[8] https://openaccess.thecvf.com/content_CVPR_2020/html/Yue_Supervised_Raw_Video_Denoising_With_a_Benchmark_Dataset_on_Dynamic_CVPR_2020_paper.html

  10. ^[9] https://arxiv.org/abs/2305.00767

  11. ^[10] https://ieeexplore.ieee.org/abstract/document/10003653/

 
  

Click to enter —> [Target Detection and Transformer] Exchange Group

ICCV/CVPR 2023 Paper and Code Download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer、NeRF等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号

It's not easy to organize, please like and watchf9bcc9272febff39361b8ecc873536b3.gif

Guess you like

Origin blog.csdn.net/amusi1994/article/details/132595249