Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene Understanding

This paper proposes a solution to the problem of driving in foggy environments.

The method is based on the fact that semantic segmentation results under moderate adverse conditions (light fog) can be bootstrapped to solve the same problem under high adverse conditions (dense fog). CMAda is scalable to other adverse conditions and provides a new paradigm for learning using both synthetic and unlabeled real data.

Furthermore, this paper makes three main independent contributions:

1) A new approach to add synthetic fog to real clear-sky scenes using semantic input;

2) A new fog density estimator;

3) A new fog densification method is proposed, which densifies the fog in the actual fog scene, but the fog depth is uncertain;

This paper is mainly divided into two parts. The first part is to simulate foggy weather and synthesize pictures of foggy scenes; the second part is to use supervised learning methods to perform semantic segmentation under dense fog. Part of the data here is Synthetic foggy weather data, part of which is real foggy weather data.

The first part performs fog simulation on real scenes

This part has three steps: deep outlier detection, robust depth plane fitting at SLIC superpixel level using RANSAC, and postprocessing of the completed depth map using guided image filtering.

Bi-Reference Interleaved Bilateral Filters for Color and Semantics 

First, process the image to be synthesized. Instead of using RGB, CIELAB color mode is used here.

Here is an introduction to the SLIC algorithm:

SLIC is a simple linear iterative clustering to analyze images from two perspectives of color and space.

 Both p and q are pixel positions, Gs space Gaussian kernel, Gc color Gaussian kernel, t^() is a projection function, J is CIELAB mode function, h(): maps pixels to semantic labels.

 

h(q) means that only the pixel q  has the same semantic label as the checked pixel can be output, which can prevent the blurring of semantic edges. At the same time, J() helps to better preserve true depth edges. When the semantic tags are different, the pixels are saved in the color domain, and the filtering is realized by the color domain filter.

The bilateral filter is composed of two independent grids, one corresponding to the semantics and one corresponding to the color domain. Filtering is performed separately on each grid, and finally the results are combined to form such a bilateral filter.

Simulation of Fog Environment Based on Optical Model

The density of the fog is controlled by the attenuation coefficient β

When performing fog simulation, the image R of the original clear scene, the atmospheric light L and the complete transmittance map t need to be input. We perform atmospheric light estimation, depth denoising and completion to obtain an initial complete transmittance map t^ from a noisy and incomplete input disparity map D using the recommended parameters. We filter t using a dual-reference cross-bilateral filter to compute the final transmittance map t, which is used to synthesize the foggy image I in .

(For atmospheric light estimation and depth denoising, please refer to this article Semantic Foggy Scene Understanding with Synthetic Data)

Part II Semantic Segmentation of Scenes with Dense Fog

CMAda constitutes a more complex framework as it utilizes both synthetic fog data and real fog data to jointly adapt the semantic segmentation model to dense fog, whereas the adaptive method in [15] only uses real data for adaptation. Furthermore, the assignment of real fog images to the correct target fog domain via fog density estimation is another important and non-trivial component of CMAda, a prerequisite for using these real images as training data in this method.

Fog Density Estimation

 

First, it is infeasible to replace SLIC superpixels with superpixels produced by the semantic annotation of the depth plane fitting step, because it will generate very large superpixels, and the planarity assumption for superpixels is completely broken.

Second, we try to omit the robust depth plane fitting step entirely and apply the dual-reference cross-bilateral filter directly to the incomplete depth map output by the outlier detection step. However, this method is very sensitive to outliers not detected in the previous steps. In contrast, these remaining outliers were successfully dealt with by ransac-based depth plane fitting

x Sunny image x' Light fog composite image x'' Dense fog composite image y is annotation

The training data includes

  Misty Synthetic Images and Labels

 Dense Fog Composite Images and Tags

  Real image of light fog without annotation

The main purpose is to learn a mapping function , and evaluate the accuracy of this mapping function based on real dense fog pictures

Since there are no annotations on the real mist pictures, we need to learn this mapping function first, and train through the mist synthesis pictures and annotations, and then we can get the labels

 

Here L is the cross-entropy loss function, here is to balance the weight constant of the two data sets.

After mixing the two data sets through the min function, put them into CNN for standard supervised learning.

In the CMAda method, there are certain shortcomings. Due to the difference in fog density, the search for the optimal solution is hindered. Therefore, the real fog density is estimated by the standard optical function

Finally, such an optical equation is obtained, and we can find that we can bypass the influence of R's real and clear image

Here the fog density is defined as

Then the expression for the real data becomes

 

 

The main part of this paper is over here, and the rest is his experimental results

Compared with the original method, this effect is still improved to a certain extent.

Guess you like

Origin blog.csdn.net/ltd0924/article/details/88808140