Instance segmentation in low light conditions

This paper opens up a new research direction on instance segmentation in low-light conditions. It is the first paper to systematically establish a training and testing verification framework for instance segmentation under low-light conditions. 

Topic: Instance Segmentation in the Dark

Unit: Beijing Institute of Technology & Princeton

Author: Linwei Chen · Ying Fu · Kaixuan Wei · Dezhi Zheng · Felix Heide

Paper link:

https://arxiv.org/abs/2304.14298

https://link.springer.com/article/10.1007/s11263-023-01808-8

Code link:

GitHub - Linwei-Chen/LIS: IJCV2023 Instance Segmentation in the Dark

Quote:

@article{2023lis,
  title={Instance Segmentation in the Dark},
  author={Chen, Linwei and Fu, Ying and Wei, Kaixuan and Zheng, Dezhi and Heide, Felix},
  journal={International Journal of Computer Vision},
  volume={131},
  number={8},
  pages={2198--2218},
  year={2023},
  publisher={Springer}
}

This paper opens up a new research direction on instance segmentation in low-light conditions. It is the first paper that systematically establishes a training and testing verification framework for instance segmentation under low-light conditions ( emphasis added, it can be a new direction for volume! ).

This paper collects and produces a Low-light Instance Segmentation (LIS) data set, which includes four sets of data of low light, normal exposure, paired JPEG and RAW, and provides 8 types of instance pixel annotations, which can be used for instance segmentation, Target detection task. ( New dataset! )

This paper observes that RAW images have better potential than JPEG images to achieve higher instance segmentation accuracy, and the author further analyzes that this is related to the fact that RAW can provide more bit-depth information (RAW is all you need! ) .

This paper observes that under dark light conditions, image noise will cause high-frequency disturbances to the features in deep neural networks, which is an important reason why existing instance segmentation methods perform poorly under dark light conditions (N oise is the key! ).

How's the effect? Using the method framework of this paper, based on the results of Mask R-CNN-ResNet50, compared with the large model Segment Anything trained on a large amount of data, the method proposed in this paper still performs well.

Existing instance segmentation techniques mainly target high-quality image input under normal lighting, but their performance drops significantly in extremely low-light environments. In this work, we delve into instance segmentation in low-light conditions and introduce several techniques that significantly improve low-light inference accuracy. The proposed method is based on the observation that noise in low-light images introduces high-frequency interference to the feature maps of neural networks, thereby significantly degrading performance. To suppress this "feature noise", we propose a novel learning method that relies on adaptive weighted downsampling layers, smooth directional convolution blocks, and interference suppression learning. These components effectively reduce feature noise during downsampling and convolution operations, enabling the model to learn perturbation-resistant features. Additionally, we found that high-bit-depth RAW images better retain richer scene information in low-light conditions than typical camera sRGB output, thus supporting the use of RAW input algorithms. Our analysis shows that high bit depth is crucial for low-light instance segmentation. To alleviate the scarcity of annotated RAW datasets, we utilize a low-light RAW compositing pipeline to generate realistic low-light data. Furthermore, to facilitate further research in this direction, we capture a real-world low-light instance segmentation dataset consisting of more than two thousand paired low-light/normal-light images with instance-level pixel-level annotations. Notably, without any image pre-processing, we achieve satisfactory instance segmentation performance (4% higher AP than state-of-the-art competitors) in very low light conditions while providing future Research opens up new opportunities. Our code and datasets are publicly available to the community (https://github.com/Linwei-Chen/LIS).

Abstract
Existing instance segmentation techniques are primarily tailored for high-visibility inputs, but their performance significantly deteriorates in extremely low-light environments. In this work, we take a deep look at instance segmentation in the dark and introduce several techniques that substantially boost the low-light inference accuracy. The proposed method is motivated by the observation that noise in low-light images introduces high-frequency disturbances to the feature maps of neural networks, thereby significantly degrading performance. To suppress this “feature noise”, we propose a novel learning method that relies on an adaptive weighted downsampling layer, a smooth-oriented convolutional block, and disturbance suppression learning. These components effectively reduce feature noise during downsampling and convolution operations, enabling the model to learn disturbance-invariant features. Furthermore, we discover that high-bit-depth RAW images can better preserve richer scene information in low-light conditions compared to typical camera sRGB outputs, thus supporting the use of RAW-input algorithms. Our analysis indicates that high bit-depth can be critical for low-light instance segmentation. To mitigate the scarcity of annotated RAW datasets, we leverage a low-light RAW synthetic pipeline to generate realistic low- light data. In addition, to facilitate further research in this direction, we capture a real-world low-light instance segmentation dataset comprising over two thousand paired low/normal-light images with instance-level pixel-wise annotations. Remarkably, without any image preprocessing, we achieve satisfactory performance on instance segmentation in very low light (4% AP higher than state-of-the-art competitors), meanwhile opening new opportunities for future research. Our code and dataset are publicly available to the community (https://github.com/Linwei-Chen/LIS).

Observations and Motivation

Two key observations:

a. Feature map degradation under low light. For clear normal light images, the instance segmentation network is able to clearly capture both low-level (e.g., edges) and high-level (i.e., semantic responses) features of objects in both shallow and deep layers. However, in noisy low-light images, shallow features may be contaminated and full of noise, while deep features are less semantically responsive to objects.

b. Comparison between camera’s sRGB output and RAW image in darkness. Due to the significantly reduced signal-to-noise ratio, the 8-bit camera output loses a lot of scene information, for example, the seat back structure is almost unrecognizable in the camera output, while it is still identifiable in the RAW image (zoomed in for better detail).

Illustration of our key observations under dark regimes that drive our method design: a Degraded feature maps under low light. For clean normal-light images, the instance segmentation network is able to clearly capture the low-level (e.g., edges) and high-level (i.e., semantic responses) features of objects in shallow and deep layers, respectively. However, for noisy low-light images, shallow features can be corrupted and full of noise, and the deep features show lower semantic responses to objects. b Comparison between camera sRGB output and RAW image in the dark. Due to significantly low SNR, the 8-bit camera output loses much of the scene information, for example, the seat backrest structure is barely discernible, whereas is still recognizable in the RAW counter- part (Zoom in for better details)

Challenges and methods

The overall approach is as follows

Low-Light RAW Synthetic Pipeline

Challenge: Training a segmentation model requires massive data with instance segmentation annotations, and currently there is no such data set. It is expensive to collect additional dark-light images and label large-scale dark-light data sets. At the same time, existing RAW data is also very scarce.

Solution: Design a pipeline from RGB images to noisy dark-light RAW images, so that the existing instance segmentation data set can be used to train an instance segmentation model for RAW input under dark-light conditions at zero cost.

Our low-light RAW compositing pipeline consists of two steps, raw and noise injection:

Reverse processing. Collecting large-scale RAW image datasets is expensive and time-consuming, so we considered leveraging existing sRGB image datasets (Everingham et al., 2010; Lin et al., 2014). sRGB images are obtained through a series of image transformation operations of the camera's internal image signal processing (ISP), such as tone mapping, gamma correction, color correction, white balance, and demosaicing. Through unprocessed operations (Brooks et al. 2019), we can invert these image processing transformations and thus obtain RAW images. This way we can create a RAW dataset at zero cost.

Noise injection. After obtaining a clean RAW image through unprocessed operations, in order to simulate a real noisy low-light image, we need to inject noise into the RAW image. To produce more accurate complex noise results, we adopt a recently proposed physics-based noise model (Wei et al., 2020, 2021) instead of the widely used Poissonian-Gaussian noise model (i.e., the heteroscedastic Gaussian model (Foi et al., 2008)). The model can accurately describe the real noise structure, taking into account many noise sources, including photon shooting noise, read noise, strip pattern noise and quantization noise.

Our low-light RAW synthetic pipeline consists of two steps, i.e., unproccessing and noise injection. We introduce them one by one.
Unprocessing. Collecting a large-scale RAW image dataset is expensive and time-consuming, hence we consider utilizing existing sRGB image datasets (Everingham et al., 2010; Lin et al., 2014). The sRGB image is obtained from RAW images by a series of image transformations of on-camera image signal processing (ISP), e.g., tone mapping, gamma correction, color correction, white balance, and demosaicking. With the help of the unprocessing operation (Brooks et al., 2019), we can invert these image processing transforma- tions, and RAW images can be obtained. In this way, we can create a RAW dataset with zero cost.
Noise injection. After obtaining clean RAW images by unprocessing, to simulate real noisy low-light images, we need to inject noise into RAW images. To yield more accurate results for real complex noise, we employ a recently proposed physics-based noise model (Wei et al., 2020, 2021), instead of the widely used Poissonian-Gaussian noise model (i.e., heteroscedastic Gaussian model (Foi et al., 2008)). It can accurately characterize the real noise structures bytakingg into account many noise sources, including photon shot noise, read noise, banding pattern noise, and quantization noise.

Adaptive Weighted Downsampling Layer

Challenge: How to solve the characteristic "noise perturbation" caused by image noise . A simple observation is that due to the smooth prior properties of the image, downsampling the image using low-pass filtering can reduce the noise level.

Considering that existing networks often have multiple downsampling layers, wouldn't it be possible to take full advantage of these downsampling processes? Experiments have shown that simply inserting a mean filter can achieve almost no-cost performance in dark-light instance segmentation. Although effective, fixed filters such as mean filter cannot be adaptively adjusted according to features, and thus may erase detailed information. In this regard, the author proposes Adaptive Weighted Downsampling Layer, AWD, which adaptively predicts low-pass filtering on features channel by channel and point by point. Therefore, the low-pass is increased in the noise area, and the low-pass level is reduced in the detail area to retain the details. Looking at the source code, FC is replaced by Depth-wise, and the effect is equivalent. The formulas are not listed here. If you are interested, you can read the original text.

Smooth-Oriented Convolutional Block

In order to further reduce the disturbance of features by high-frequency noise in images, the author also proposed to use re-parameterization technology to simultaneously train a set of smooth convolution kernels during training and fuse them into the original convolution kernels during inference. In this way Convolution is more robust in the face of noise. It is worth noting that this does not increase the amount of parameters and calculations during inference! Pure prostitution!

Disturbance Suppression Learning

At the same time, the author also made some adjustments during model learning, allowing the model to learn clean and noisy images at the same time, and constraining the features of the input to be noisy images to be closer to clean images, which is somewhat similar to knowledge distillation, but does not require a teacher. This not only improves the robustness of the model under dark light, but also improves images with normal lighting. This is very consistent with the actual application scenario, that is, a model can cope with both day and night applications. LIS data set The data set was captured by Canon EOS 5D Mark IV and has the following characteristics:

  • Paired samples. In the LIS dataset, we provide images in sRGB-JPEG (typical camera output) and RAW formats, each format including paired short-exposure low-light and corresponding long-exposure normal-light images. We refer to these four types of images as sRGB-Dark, sRGB-Normal, RAW-Dark and RAW-Normal. To ensure they were aligned at the pixel level, we mounted the camera on a sturdy tripod and controlled it remotely via a mobile app to avoid vibration.

  • Diverse scenarios. The LIS dataset consists of 2230 pairs of images collected in various scenes, including indoors and outdoors. To add variety to low-light conditions, we shot long-exposure reference images using a range of ISO levels (e.g. 800, 1600, 3200, 6400), and deliberately through a range of low-light factors (e.g. 10, 20, 30, 40, 50 , 100) Reduce exposure time and shoot short exposure images to simulate very low light conditions. whaosoft  aiot  http://143ai.com  

  • Instance-level pixel-level labels. For each pair of images, we provide accurate instance-level pixel-level labels annotating instances of the 8 most common object categories in our daily lives (bicycles, cars, motorcycles, buses, bottles, chairs, dining tables, televisions) . We note that LIS contains images taken in different scenes (indoor and outdoor) and under different lighting conditions. In Figure 7, object occlusion and densely distributed objects make LIS more challenging outside of low-light conditions.

– Paired samples. In the LIS dataset, we provide images in both sRGB-JPEG (typical camera output) and RAW formats, each format consists of paired short-exposure low-light and corresponding long-exposure normal-light images. We term these four types of images as sRGB- dark, sRGB-normal, RAW-dark, and RAW-normal. To ensure they are pixel-wise aligned, we mount the camera on a sturdy tripod and avoid vibrations by remote control via a mobile app.
– Diverse scenes. The LIS dataset consists of 2230 image pairs, which are collected in various scenes, including indoor and outdoor. To increase the diversity of low-light conditions, we use a series of ISO levels (e.g., 800, 1600, 3200, 6400) to take long-exposure reference images, and we deliberately decrease the exposure time by a series of low-light factors (e.g., 10, 20, 30, 40, 50, 100) to take short-exposure images for simulating very low-light conditions.
– Instance-level pixel-wise labels. For each pair of images, we provide precise instance-level pixel-wise labels annotate instances of 8 most common object classes in our daily life (bicycle, car, motorcycle, bus, bottle, chair, dining table, tv). We note that LIS contains images captured in different scenes (indoor and outdoor), and different illuminationconditionss. In Fig.7, object occlusion and densely distributed objects make LIS more challenging besides the low light.

Experimental results

AblationMain results

Provides the use of Mask R-CNN, PointRend, Mask2Former, Faster R-CNN methods and the backbone network ResNet-50, Swin-Transformer, ConvNeXt to prove the effectiveness on the two tasks of instance segmentation and target detection: visual  result  inspiration

Detection and segmentation is a hot mainstream research direction, and papers are emerging one after another. If you want to explore in the red ocean, the best way is to create a blue ocean yourself. This paper finds a new application scenario for instance segmentation, builds a framework for instance segmentation training and verification under dark light conditions, and obtains a series of simple but effective methods from the observation of key phenomena. It can be seen that observation is the basis of scientific research. Good observation can allow people to discover interesting phenomena and problems, thereby leading to new problems and opening up new paths.

 

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/133105596