Point Cloud 3D Weather Data Enhancement- Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in ... (ICCV 2021)

Disclaimer: This translation is only a personal study record

Article information

Summary

  This work addresses the challenging task of lidar-based 3D object detection in foggy weather. Collecting and annotating data in such scenarios is very time-consuming, labor-intensive, and cost-intensive. In this paper, we address this issue by simulating physically accurate fog into clear-weather scenes, so that the large existing real-world datasets captured in clear-weather can be reused for our task. Our contribution is twofold: 1) We develop a method for physically efficient fog simulation that works on any LiDAR dataset. This enables large-scale foggy training data to be obtained at no additional cost. These partially synthesized data can be used to improve the robustness of several perception methods, such as 3D object detection and tracking or simultaneous localization and mapping. 2) Through extensive experiments using several state-of-the-art detection methods, we show that fog simulation can be exploited to significantly improve 3D object detection performance in the presence of fog. Thus, we are the first to provide a strong baseline for 3D object detection on the Seeing Through Fog (STF) dataset. Our code is available at www.trace.ethz.ch/lidar_fog_simulation.

1 Introduction

  Light detection and ranging (LiDAR) is critical to enabling safe autonomous vehicles because LiDAR measures the precise distance of objects from the sensor, which cameras cannot measure directly. As a result, LiDAR has entered many application domains, including detection [23, 34], tracking [7, 50], localization [25, 8] and mapping [48, 15]. Despite the benefit of measuring precise depth information, LiDAR has a significant disadvantage. Unlike automotive radar, LiDAR sensors emit pulses of light in the invisible near-infrared (NIR) spectrum (typically at wavelengths of 850 and 903 to 905 nm [4]) that do not penetrate water particles. This means that as soon as there are foggy water particles in the air, the light pulses from the sensor are backscattered and attenuated. Attenuation reduces the received signal power corresponding to the distance at which solid targets in the line of sight should be measured, while backscatter creates spurious peaks in received signal power at incorrect distances. Therefore, whenever fog is present at the time of capture, the acquired LiDAR point cloud will contain some spurious returns. This is a big challenge for most outdoor applications as they usually require robust performance in all weather conditions.

insert image description here

Figure 1: (Top) Lidar returns due to fog in the scene. (a) shows the strongest return and (b) the last return, color-coded by LiDAR channel. To better see the point where the fog was introduced, the return of the ground was removed. Optimal color (red for low, cyan for high, green for 3D bounding box annotations, gray for ego vehicle dimensions)

  In recent years, several LiDAR datasets for 3D object detection have been proposed [10, 3, 19, 5, 39, 27, 11, 16]. Although many of them contain different driving scenarios, none of them allow the evaluation of different types of adverse weather. Just recently, the Canadian Adverse Driving Conditions (CADC) dataset [28] and the Seeing Through Fog (STF) dataset [2] address the need for such evaluations. While CADC focuses on snowfall, STF addresses fog, rain, and snowfall assessments. Therefore, there is still not a large amount of lidar fog data available for training deep neural networks.

  The reason is obvious: collecting and annotating large-scale datasets is inherently time-, labor- and cost-intensive, not to mention doing so in inclement weather conditions.

  This is exactly what our work addresses. In Section 3, we propose a physics-based fog simulation that converts real clear-weather lidar point clouds into foggy clouds. In particular, we simulate the transmission of lidar pulses using a standard linear system [30]. We distinguish between clear-weather and foggy situations based on the impulse response of this system, and make a formal link between the received responses under fog and the corresponding responses under clear-weather conditions. This connection directly transforms the distance and intensity of each original clear-weather point, so that the new distance and intensity correspond to the measurements when fog was present in the scene. We then show in Section 4 that several state-of-the-art 3D object detection pipelines can be trained on our partially synthetic data to improve robustness to real foggy data. This scheme has been applied to semantic segmentation of images [32, 31, 13], and we show that it is also successfully used on LiDAR data and 3D object detection.

  For our experiments, we simulate fog on the clear weather training set of STF [2] and evaluate on their real fog test set. Figure 1 shows an example scene of the STF dense fog test set, where the noise introduced by the fog is clearly visible in the LiDAR data. The authors of STF [2] used Velodyne HDL-64E as the main lidar sensor. The sensor has 64 channels and a so-called dual mode. In this mode, it measures not only the strongest, but also the last return of each individually emitted light pulse. Even if the last signal contains less severe noise, fog can still cause a large number of false returns. Therefore, even in this dual mode, the sensor cannot fully "see through the fog".

  Figure 2 shows an interesting feature of the noise introduced by the fog that it is not evenly distributed around the sensor. Instead, the presence of noise depends on the presence of objects within a certain distance from the sensor in the line of sight. If there is a solid target at moderate distances, there will be few, if any, spurious echoes from the individual pulses. On the other hand, if there are no targets in line of sight below a certain distance, there will be a lot of false echoes caused by fog. This becomes evident in the example in Figure 2, with a hill on the left side of the road and a clearing behind a guardrail on the right. Only in the latter case will fog-induced noise appear in the measurements. Theoretical formulations in Section 3 explain this behavior.

  Incidentally, similar sensor noise could also be caused by exhaust fumes, but if the future of transportation is electric, at least the problem might disappear into the air.

insert image description here

Figure 2: (Top) Lidar returns due to fog in the scene. Color coding by LiDAR channel in (a) and intensity in (b). To better see the point where the fog was introduced, the return of the ground was removed. Optimal color, same color coding as in Figure 1.

2. Related work

2.1 The impact of adverse weather on lidar

  Some early works include Isaac et al. [20]. In 2001, they studied the effect of fog and haze on optical wireless communication in the near-infrared spectrum. Then, in 2011, Rasshofer et al. [30] studied the impact of weather phenomena on automotive lidar systems. Adverse weather conditions have received more attention in recent years, and there are many other works worth mentioning that study the degradation of lidar data under different adverse weather conditions [45, 9, 21, 17, 14, 44, 24, 22 ]. More recently, in 2020, the authors of LIBRE [4] tested several lidar sensors indoors in rain and fog weather. As such, they provide important insights into the current robustness of individual sensors under challenging weather conditions.

2.2 Severe weather and lidar simulation

  In the automotive environment, artificial fog simulation has so far been largely limited to image-based methods. Sakaridis et al. [32] for example created a foggy version of the semantic segmentation dataset Cityscapes [6] by exploiting the depth information given in the original dataset, and Hahner et al. [13] the purely synthetic dataset Synscapes [46] foggy version. Sakaridis et al. also released ACDC [33], a dataset providing semantic pixel-level annotations for 19 cityscape classes under adverse conditions. Just recently, Bijelic et al. [2] proposed a first-order approximation to model fog in automotive LiDAR environments. However, their simulations were only meant to reproduce the measurements they made in a 30-meter-long fog chamber.

  Goodin et al. [12] developed a model to quantify lidar performance degradation in rain and incorporated their model into simulations for testing advanced driver assistance systems (ADAS). Michaud et al. [26] and Tremblay et al. [41] proposed a method for rendering rain on images to evaluate and improve the robustness of rain images.

2.3 3D object detection

  After the release of many LiDAR datasets [10, 3, 19, 5, 39, 27, 11, 16, 28, 2] in the past few years, 3D object detection has received increasing attention in the race for autonomous driving. While camera-based methods such as Simonelli et al. [38] and gated camera methods such as Gated3D [18] exist, the top positions of all dataset leaderboards are usually ranked among LiDAR-based methods.

  Major works on LiDAR-based 3D object detection methods include PointNet [29] and VoxelNet [49]. PointNet [29] is a neural network that directly processes point clouds without prior quantization of 3D space into voxels. Most notably, the network architecture is robust to input perturbations, so the order in which points are fed into the network does not affect its performance. VoxelNet [49] is based on the idea of ​​quantizing 3D space into equal-sized voxels, and then utilizes PointNet-like layers to process each voxel. However, it is a rather heavy architecture due to its computationally intensive 3D convolutions.

  This is where PointPillars [23] comes in: it gets rid of quantization in the height domain and instead processes point clouds in a 3D pillar grid. PointPillars [23] is based on the SECOND [47] code, but thanks to the new pillar idea, it can fall back to faster 2D convolutions and achieve very competitive results at a faster speed. Its successor, PointPainting [43], further exploits image segmentation results and “paints” the points with pseudo-class labels before processing them with the PointPillars [23] architecture.

  Several recent milestones have been achieved in 3D object detection by Shi et al. PointRCNN [36] is a two-stage architecture, where the first stage generates 3D bounding box proposals from point clouds in a bottom-up manner, and the second stage refines these 3D bounding box proposals in a canonical manner. Part-A2 [37] is part-aware in the sense that the network considers which part of the object the point belongs to. It exploits partial positions within these targets, which can lead to higher results. PV-RCNN [34] and its follow-up PV-RCNN++ [35] are the latest work in which they simultaneously process (coarse) voxels and raw points of point clouds.

3. Fog simulation on real LiDAR point cloud

  To simulate the effect of fog on a real-world LiDAR point cloud recorded in clear weather, we need to resort to a model of the optical system that underlies the functionality of the LiDAR sensor's emitter and receiver. In particular, we examine a single measurement/point, model the full signal of received power as a function of distance, and recover its exact form corresponding to the original clear-weather measurement. This allows us to operate in the signal domain and achieve transitions from clear weather to fog by modifying the part of the impulse response associated with the optical channel (i.e. the atmosphere). In the remainder of this section, we first provide a background on the LiDAR sensor optical system, and then introduce a fog simulation algorithm based on this system.

3.1 Background of Lidar Optical Model

  Rasshofer et al. [30] introduced a simple linear system model to describe the received signal power of the lidar receiver, which is valid for inelastic scattering. In particular, the distance-dependent received signal power P R is modeled as a temporal convolution between the time-dependent transmitted signal power PT and the time-dependent impulse response H of the environment:
insert image description here
c is the speed of light, and CA is independent of time and range of system constants. For our fog simulation, C A can be decomposed as we explained in Section 3.2 .

  We proceed with the description and modeling of the remaining terms in (1). The temporal characteristics of transmitted pulses can be modeled for automotive LiDAR sensors [30] by a sin2 function: where
insert image description here
P0 represents the peak power of the pulse and τH represents the half-power pulse width. Typical values ​​for τ H are between 10 and 20 ns [30]. In (2), the time origin is set to the start of the pulse, so if the LiDAR sensor does not report the distance relative to the rising edge, but instead returns the maximum value of the corresponding peak in the signal, we can perform the desired correction. Since it is common to report rising edges in embedded signal processing, we maintain this convention in all equations and show where such corrections can be made later.

The spatial impulse response function H of the environment can be modeled as the product of the individual impulse responses of   the optical channel HC and the target HT : The impulse response of the optical channel HC is where T(R) represents the total one-way transmission loss, ξ(R) represents the crossover function that defines the ratio of the area illuminated by the emitter to the area observed by the receiver, as shown in Figure 3. Since the full details of the optical configuration in commercial lidar sensors are usually not publicly available (i.e. the precise values ​​of R1 and R2 are unknown), in our case ξ(R) is a piecewise linear approximation defined as the total unidirectional transmission The loss T(R) is defined as where α(r) represents the spatially varying attenuation coefficient. In our simulations, we assume a homogeneous optical medium, ie α(r) = α. As a result, (6) yields that the attenuation coefficient α depends on the weather at the time of measurement and increases as the visibility range decreases. Therefore, for the same 3D scene, the impulse response of the optical channel HC varies with visibility.
insert image description here

insert image description here

insert image description here

insert image description here

insert image description here

  The last item to be modeled for the optical system (1) is the impulse response HT of the target . However, we need to distinguish the case of HT according to weather conditions , since the object composition of the same 3D scene is different in fog than in clear weather. For the same 3D scene, we help to construct a direct relationship between the response PR in clear weather and fog , which allows us to simulate fog in real clear weather LiDAR measurements.

3.2 LiDAR fog simulation

  We now detail the optical models in Section 3.1 for the individual cases of clear weather and fog in terms of the impulse response terms HC and HT .

  In clear weather, the attenuation coefficient α is 0, so
insert image description here
moreover, clear-weather targets consist only of solid targets onto which the lidar pulse is reflected. This type of target is called a hard target [30].

insert image description here

Figure 3: Schematic of a LiDAR sensor where the transmitter Tx and receiver Rx have no coaxial optics but have parallel axes. This is known as a dual base beam configuration. Images adjusted according to [30].

The Dirac delta function form of the impulse response HT of a hard target in the range of R0 is where β0 represents the   differential
insert image description here
reflectivity of the target. If we only consider the diffuse reflection ( Lambertian surface), then β0 is   given by The property f(x)δ(x−x 0 )=f(x 0 )δ(xx 0 ) is used . Since R 2 is less than two meters in practice [42], we can safely assume that R 2 ≪ R 0 , thus ξ(R 0 )=1. Therefore, starting from (1), given the distance measurement R 0 of the original clear-weather lidar point , we compute the received signal power in closed form as follows }{2}
insert image description here

insert image description here

insert image description here
R0+2cτHreaches the maximum value. Therefore, as we mentioned in Section 3.1, here, if necessary, the response P R , clear(R) can simply be shifted by − c τ H 2 -\frac{cτ_H}{2}2cτH

  We have now established the transformation from P R,clear (R) to P R,clear (R) under fog. While the same hard target is still present since the 3D scene is the same, the fog now has an additional contribution which constitutes a soft target [30] providing distributed scattering for the impulse response HT .

  This soft fog target HT soft H_T^{soft}HTsoftThe impulse response of is a Heaviside function of the form where
insert image description here
β represents the backscattering coefficient, which is constant under our homogeneity assumption, and U is the Heaviside function.

  The coexistence of hard and soft targets can be modeled by taking the superposition of their respective impulse responses:
insert image description here

  We observe that the spatial impulse response is more complex in fog than in clear weather, but it can still be decomposed into two terms, corresponding to hard and soft targets, leading to the decomposition of the received response
insert image description here
into Target item, use (1) to calculate the received response PR , foghard P^{hard}_{R,fog}PR , f o ghard, and again using the assumption of R 2 R 0 , we obtain
insert image description here
PR,clearattenuated version of . On the other hand, the soft objective term is
insert image description here
and does not have a closed-form expression, since the corresponding integral I(R,R 0 ,α,τ H ) on the right side of (18) cannot be computed analytically.

insert image description here

Figure 4: Received signal power PR , fog P_{R,fog} from a single lidar pulsePR , f o gtwo terms of , with the reflected pulse of a solid target ( PR , foghard P^{hard}_{R,fog}PR , f o ghard) and a soft fog target ( PR , fogsoft P^{soft}_{R,fog}PR , f o gsoft) correlated, plotted on the distance domain. In (a) the fog is not thick enough to generate echoes, while in (b) it is thick enough to generate echoes obscuring solid targets at R 0 =30m.

  However, for a given τH and α, I(R,R0 , α,τH ) can be numerically computed for a fixed value of R. We numerically integrate using Simpson's 1/3 rule and provide PR , fogsoft P^{soft}_{R,fog} in Figure 4PR , f o gsoftIndicative example of contour. Depending on the distance of the hard target from the sensor, the soft target term of the response may exhibit a larger maximum value than the hard target term, which means that the measurement range changes due to the presence of fog and becomes equal to the maximum point of the soft target term.

  The formulation we developed provides a simple algorithm for fog simulation on clear-weather point clouds. The input parameters of the algorithm are α, β, β 0 and τ H . The main input to the algorithm is a clear weather point cloud, where each point p ∈ R3 has a measured intensity i. We assume that the sensor's strength reading is the received signal power PR corresponding to each measurement , clear P_{R,clear}PR,clearA linear function of the maximum value of . The procedure for each point p is given in Algorithm 1. Note that we are in PR , fogsoft P^{soft}_{R,fog}PR , f o gsoftSome noise is added to the distance (lines 14-15), otherwise PR , fogsoft P^{soft}_{R,fog}PR , f o gsoftAll points introduced will be precisely on the circle around the lidar sensor.

4. Results

4.1 Fog Simulation

  A qualitative comparison between our fog simulation and the one in [2] can be found in Fig. 5. We can see that compared to the fog simulation in [2] (where the response of soft targets is only modeled heuristically), our fog simulation responds to PR , fogsoft P^{soft} _ {R, fog}PR , f o gsoftfor modeling. To highlight this difference, we specifically choose a clear weather scene with a similar layout to the real fog scene depicted in Fig. 2. Only in our fog simulation (most clearly visible in the lower right visualization of Figure 5) does similar semicircular fog noise be simulated. In the supplementary material we show comparisons with other α values.

insert image description here

insert image description here

Figure 5: Comparison of our fog simulation (bottom) with that in [2] (middle), with α set to 0.06, corresponding to a meteorological optical range (MOR) ≈50m. In the left column, the point cloud is color-coded by intensity, and in the right column, it is color-coded by height (z-value). The top row shows the raw point cloud.

4.2 3D Object Detection in Fog

  Our experimental setup codebase is derived from OpenPCDet [40]. It provides implementations of 3D object detection methods PV-RCNN [34], PointRCNN [36], SECOND [47], Part-A 2 [ 37] and PointPillars [23]. For our experiments, we train all these methods from scratch for 80 epochs on the STF [2] dataset with their standard training strategy. We also tried to fine-tune according to the KITTI [10] weights (using the same lidar sensor), but we did not see any benefit other than faster network convergence, so all the numbers you see in Section 4.2 are Trained from scratch on the STF [2] sunny weather training set, which consists of 3469 scenes. The STF [2] clear-weather validation and test sets consist of 781 and 1847 scenes, respectively. However, the main benefit of using STF [2] in our experiments is that it comes with a test set for different adverse weather conditions. In particular, it is equipped with a light fog test set of 946 scenes and a dense fog test set of 786 scenes. This allows us to test the effectiveness of the fog simulation pipeline on real fog data.

  For the fog simulation, we assume a Velodyne HDL-64E sensor with a half-power pulse width τ H of 20ns and set β to 0.046 MOR \frac{0.046}{MOR}MOR0.046, as described by Rasshofer et al. [30]. We empirically set the β 0 of all points to be 1 × 1 0 − 6 π \frac{1×10^{−6}}{π}Pi1×106, to obtain an intensity distribution similar to what we observe in real foggy point clouds of STF [2]. Because Velodyne HDL-64E uses some unknown internal dynamic gain mechanism, it can provide intensity values ​​in the full value range [0, 255] at each time step. To mimic this behavior and again cover the entire value range, we linearly scale up the intensity values ​​after modifying them via Algorithm 1.

4.2.1 Quantitative results

  For our reported numbers, we choose the best performing snapshot on the clear-weather validation set and test it on the test split described above. In Table 1, we report the 3D average precision (AP) for the STF [2] dense fog test for the car, cyclist, and pedestrian categories, and the 3D average precision (mAP) for the three categories. Note that there is always one model predicting all three classes, not one model per class. All AP and mAP numbers reported in this paper are calculated using the 40 recall locations suggested in [38]. We can see that the training runs of all methods using our fog simulation outperform the clear-weather baseline and training runs using fog simulation in [2] in mAP across all classes and the main car category.

  As a second baseline, we evaluate the sunny-weather model after applying an additional preprocessing step at test time, where we feed the network only the points present in the two, which are the strongest and last measurements of the same scene. We refer to this filter as the "strongest last filter". The idea of ​​this filter stems from the fact that all points discarded by this filter must be noise (most likely introduced by fog in the scene) and cannot come from physical objects of interest. We can see that this filter improves the performance of the sunny weather model in most cases, but also does not outperform any fog simulation runs in most cases. One might also notice that the performance of cyclists is generally lower than the other two classes. We attribute this to the fact that the cyclist class is rather underrepresented compared to the other two classes in the STF [2] dataset (e.g., in the dense fog test, 28 cyclists for 490 pedestrians and 1186 cars). For the pedestrian (and cyclist) class, we still achieve three out of five state-of-the-art performance. For training runs using our fog simulation or the fog simulation in [2], we uniformly sample each training example α from [0, 0.005, 0.01, 0.02, 0.03, 0.06], which correspond to approximately [∞ , 600, 300, 150, 100, 50] m MOR. Attempting more sophisticated techniques, such as course learning [1], is for future work.

insert image description here

Table 1: 3D Average Precision (AP) results for STF [2] dense fog test split.
†Sunny weather baseline ‡Sunny weather baseline (same as †model), with strongest last filter applied at test time
*Fog simulation applied to each training example, where α ranges from [0, 0.005, 0.01, 0.02, 0.03 , 0.06] evenly sampled

insert image description here

Table 2: Results of all relevant STF [2] test splits for automotive 3D [email protected].
†Sunny weather baseline ‡Sunny weather baseline (same as †model), with strongest last filter applied at test time
*Fog simulation applied to each training example, where α ranges from [0, 0.005, 0.01, 0.02, 0.03 , 0.06] evenly sampled

insert image description here

Figure 6: The (top) row shows the predictions of PV-RCNN [34] trained on the original clear-weather data (first row in the above table), and the (bottom) row shows the predictions of PV-RCNN [44] on clear-weather and simulated Predictions for three example scenes from the STF [2] dense fog test split trained on a mixture of fog data (fourth row in the table above). Truth boxes are colored, model predictions are white. Best viewed on screen (and zoomed in).

  In Table 2, we show the 3D AP of the main car classes on dense fog, light fog and clear test sets, and the mAP under these three weather conditions. We can see that in dense fog, the training runs of all methods using our fog simulation outperform all other training runs, which is the goal of our physically accurate fog simulation. We can further see that mixing fog simulations in training does not affect the clear weather performance much, so we also achieve state-of-the-art mAP in most cases for all three weather conditions.

  In the supplementary material, we discuss why we choose to focus on a relaxed intersection over (IoU) threshold and present results using the official KITTI [10] evaluation stringency. Additionally, we present 2D and Bird's Eye View (BEV) results, with further details on the STF [2] dataset.

4.2.2 Qualitative results

  In Figure 6, we show three examples that significantly outperform the sunny-weather baseline. We can check and see that our model simulating fog during training has fewer false positives (left), more true positives (middle), and overall more accurate predictions (right), applying the same confidence threshold each time for a fair comparison .

5 Conclusion

  In this work, we introduce a physically accurate method for converting real-world clear-weather point clouds to foggy point clouds. In this process, we have full control over all parameters involved in the physics equations. Not only does this allow us to realistically simulate fog of any density, it also allows us to simulate the effect of fog on any lidar sensor currently available on the market.

  We show that by using this physically accurate fog simulation, we can improve the performance of several state-of-the-art 3D object detection methods on point clouds collected in real-world dense fog. We expect larger performance gains for our fog simulations if lidar data is annotated in 360° and not just in the field of view of a single forward-facing camera, but there are currently no publicly available datasets to validate This assumption.

  We believe that our physically accurate fog simulation is not only suitable for 3D object detection tasks. Therefore, we hope that our fog simulation will find its way into many other tasks and works as well.

Acknowledgements: This work was funded by Toyota Motor Europe via the research project TRACE Zurich.

References

[1] Yoshua Bengio, J´erˆome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In International Conference on Machine Learning (ICML), 2009.
[2] Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[3] Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. arXiv preprint 1903.11027, 2019.
[4] A. Carballo, J. Lambert, A. Monrroy, D. Wong, P. Narksri, Y. Kitsukawa, E. Takeuchi, S. Kato, and K. Takeda. Libre: The multiple 3d lidar dataset. In IEEE Intelligent Vehicles Symposium (IV), 2020.
[5] M. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, and J. Hays. Argoverse: 3d tracking and forecasting with rich maps. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[6] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[7] A. Dewan, T. Caselitz, G. D. Tipaldi, and W. Burgard. Motion-based detection and tracking in 3d lidar scans. In IEEE International Conference on Robotics and Automation (ICRA), 2016.
[8] M. Elhousni and X. Huang. A survey on 3d lidar localization for autonomous vehicles. In 2020 IEEE Intelligent Vehicles Symposium (IV), 2020.
[9] A. Filgueira, Higinio Gonzalez, Susana Lag¨uela, Lucia D´ıaz Vilari˜no, and Pedro Arias. Quantifying the influence of rain in lidar performance. Measurement, 95, 2016.
[10] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[11] Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S. Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian M¨uhlegg, Sebastian Dorn, Tiffany Fernandez, Martin J¨anicke, Sudesh Mirashi, Chiragkumar Savani, Martin Sturm, Oleksandr Vorobiov, Martin Oelker, Sebastian Garreis, and Peter Schuberth. A2d2: Audi autonomous driving dataset. arXiv preprint 2004.06320, 2020.
[12] Christopher Goodin, Daniel Carruth, Matthew Doude, and Christopher Hudson. Predicting the influence of rain on lidar in adas. Electronics, 8, 2019.
[13] Martin Hahner, Dengxin Dai, Christos Sakaridis, Jan-Nico Zaech, and Luc Van Gool. Semantic understanding of foggy scenes with purely synthetic data. In IEEE International Conference on Intelligent Transportation Systems (ITSC), 2019.
[14] R. Heinzler, P. Schindler, J. Seekircher, W. Ritter, and W. Stork. Weather influence and classification with automotive lidar sensors. In IEEE Intelligent Vehicles Symposium (IV), 2019.
[15] Wolfgang Hess, Damon Kohler, Holger Rapp, and Daniel Andor. Real-time loop closure in 2d lidar slam. In IEEE International Conference on Robotics and Automation (ICRA), 2016.
[16] X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang. The apolloscape open dataset for autonomous driving and its application. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 42, 2020.
[17] Maria Jokela, Matti Kutila, and Pasi Pyyk¨onen. Testing and validation of automotive point-cloud sensors in adverse weather conditions. Applied Sciences, 9, 2019.
[18] Frank D. Julca-Aguilar, Jason Taylor, Mario Bijelic, Fahim Mannan, Ethan Tseng, and Felix Heide. Gated3d: Monocular 3d object detection from temporal illumination cues. arXiv preprint 2102.03602, 2021.
[19] R. Kesten, M. Usman, J. Houston, T. Pandya, K. Nadhamuni, A. Ferreira, M. Yuan, B. Low, A. Jain, P. Ondruska, S. Omari, S. Shah, A. Kulkarni, A. Kazakova, C. Tao, L. Platinsky, W. Jiang, and V. Shet. Lyft level 5 perception dataset 2020. https://level5.lyft.com/dataset, 2019.
[20] Isaac I. Kim, Bruce McArthur, and Eric J. Korevaar. Comparison of laser beam propagation at 785 nm and 1550 nm in fog and haze for optical wireless communications. In Optical Wireless Communications III, volume 4214. International Society for Optics and Photonics, SPIE, 2001.
[21] M. Kutila, P. Pyyk¨onen, H. Holzh¨uter, M. Colomb, and P. Duthon. Automotive lidar performance verification in fog and rain. In International Conference on Intelligent Transportation Systems (ITSC), 2018.
[22] M. Kutila, P. Pyyk¨onen, W. Ritter, O. Sawade, and B. Sch¨aufele. Automotive lidar sensor development scenarios for harsh weather conditions. In IEEE International Conference on Intelligent Transportation Systems (ITSC), 2016.
[23] Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[24] Y. Li, P. Duthon, M. Colomb, and J. Ibanez-Guzman. What happens for a tof lidar in fog? IEEE Transactions on Intelligent Transportation Systems, 2020.
[25] W. Lu, Y. Zhou, G. Wan, S. Hou, and S. Song. L3-net: Towards learning based lidar localization for autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[26] S. Michaud, Jean-Franc¸ois Lalonde, and P. Gigu`ere. Towards characterizing the behavior of lidars in snowy conditions. In International Conference on Intelligent Robots and Systems (IROS), 2015.
[27] Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The h3d dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. arXiv preprint 1903.01568, 2019.
[28] Matthew Pitropov, Danson Garcia, Jason Rebello, Michael Smart, Carlos Wang, Krzysztof Czarnecki, and Steven Waslander. Canadian adverse driving conditions dataset. arXiv preprint 2001.10117, 2020.
[29] Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[30] R. Rasshofer, M. Spies, and H. Spies. Influences of weather phenomena on automotive laser radar systems. Advances in Radio Science, 9, 2011.
[31] Christos Sakaridis, Dengxin Dai, Simon Hecker, and Luc Van Gool. Model adaptation with synthetic and real data for semantic dense foggy scene understanding. In European Conference on Computer Vision (ECCV), 2018.
[32] Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 126, 2018.
[33] Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In IEEE International Conference on Computer Vision (ICCV), 2021.
[34] Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[35] Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, and Hongsheng Li. PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3d object detection. arXiv preprint 2102.00463 , 2021.
[36] Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. Pointrcnn: 3d object proposal generation and detection from point cloud. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [37] Shaoshuai Shi,
Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
[38] Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel Lopez-Antequera, and Peter Kontschieder. Disentangling monocular 3d object detection. In IEEE International Conference on Computer Vision (ICCV), 2019.
[39] Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in perception for autonomous driving: Waymo open dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[40] OpenPCDet Development Team. Openpcdet: An open-source toolbox for 3d object detection from point clouds. https://github.com/open-mmlab/OpenPCDet, 2020.
[41] Maxime Tremblay, Shirsendu S. Halder, Raoul de Charette, and Jean-Franc¸ois Lalonde. Rain rendering for evaluating and improving robustness to bad weather. International Journal of Computer Vision (IJCV), 126, 2021.
[42] Velodyne. Hdl-64e user’s manual, 2007.
[43] Sourabh Vora, Alex H. Lang, Bassam Helou, and Oscar Beijbom. Pointpainting: Sequential fusion for 3d object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[44] A. M. Wallace, A. Halimi, and G. S. Buller. Full waveform lidar for adverse weather conditions. IEEE Transactions on Vehicular Technology, 69, 2020.
[45] Jacek Wojtanowski, Marek Zygmunt, Mirosława Kaszczuk, Z. Mierczyk, and Michał Muzal. Comparison of 905 nm and 1550 nm semiconductor laser rangefinders’ performance deterioration due to adverse environmental conditions. OptoElectronics Review, 22, 2014.
[46] Magnus Wrenninge and Jonas Unger. Synscapes: A photorealistic synthetic dataset for street scene parsing. arXiv preprint 1810.08705, 2018.
[47] Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely embedded convolutional detection. Sensors, 18, 2018.
[48] Ji Zhang and Sanjiv Singh. Loam : Lidar odometry and mapping in real-time. Robotics: Science and Systems Conference (RSS), 2014.
[49] Y. Zhou and O. Tuzel. Voxelnet: End-to-end learning for point cloud based 3d object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[50] Claudia ´Alvarez Aparicio, ´Angel Manuel Guerrero Higueras, Francisco Javier Rodr´ıguez-Lera, Jonatan Gin´es Clavero, Francisco Mart´ın Rico, and Vicente Matell´an. People detection and tracking using lidar sensors. Robotics, 8, 2019.

supplementary material

insert image description here

Figure 1: Size comparison of STF [1] and the well-known KITTI [2] dataset for 3D object detection.

  In this supplementary paper, we provide additional information (Section 1) on the perspective fog (STF) [1] dataset. We show an extended comparison (Section 2) between our fog simulation and the one in [1]. Finally, we also present additional 2D, 3D and Bird's Eye View (BEV) results for the proposed 3D object detection method in the paper (Section 3).

1. Other details of Seeing Through Fog

  Figure 1 shows the size comparison between Seeing Through Fog (STF) [1] and the well-known KITTI [2] dataset. While the training set sizes of STF [1] are almost similar, the validation and test sets are much smaller. However, STF [1] has the advantage of being tested individually for different adverse weather conditions. For example, it recorded 946 scenes in light fog and 786 scenes in dense fog.

  Each scene further contains a label if collected during the day or night. This information is shown in Figure 2. In Figure 3, we show the number of annotation targets per class and split. While the numbers of cars and pedestrians are fairly balanced, we can see that cyclists are fairly under-represented in STF [1].

insert image description here

Figure 2: Number of daytime and nighttime scenes per split in STF [1].

insert image description here

Figure 3: Number of targets per class and split in STF [1].

  As stated in the paper, we hypothesize that this is why all reported methods perform poorly on cyclists and perform poorly compared to the other two classes cars and pedestrians.

2. Additional fog simulation results

  In Figures 5 and 6, we visually compare the fog simulation with that in Bijelic et al. [1]. While there is a clear weather point cloud on the top row, we continue from top to bottom with alpha values ​​of 0.005, 0.01, 0.02, 0.03, 0.06, 0.12, and 0.3, which correspond to approximately 600, 300, 150, 100 , 50, 25 and 10m Meteorological Optical Range (MOR). In Fig. 4 we show the images captured simultaneously by the camera. In Fig. 6, we can see that both our simulation and the simulation in [1] modify the intensity of the point cloud in a similar way. However, in Fig. 5, we can clearly see that the fog simulation in [1] fails to replicate the fog-induced return compared to our fog simulation. In the fog simulations of [1], these echoes only become apparent at very low MOR (i.e. <25m). Clearly, this doesn't line up with what we see in actual measurements.

3. Other quantitative results

  Early in our experiments, we found that the official KITTI [2] 3D average accuracy metric, with intersection over (IoU) thresholds of 0.7, 0.5, and 0.5 for cars, cyclists, and pedestrians, respectively, is not suitable for evaluating in bad weather. STF [1] is too strict. The performance of all reported methods is low (i.e., the highest mean accuracies are 24% and 16% for the medium car and medium pedestrian classes in the dense fog test). We conclude that these IoU thresholds are too restrictive for the localization of objects of interest.

  Accurate predictions are more challenging in bad weather than in clear weather, and we claim that it is more important to focus on detection itself rather than strictly localizing objects. That is, it is better to accept a coarse detection of an object than to discard it, since localization is imprecise. This is why, when evaluating on the STF [1] dataset, we chose to relax the IoU threshold to 0.5 for the car class and 0.25 for cyclists and pedestrians, effectively eliminating the impact on localization accuracy. Pay attention to.

  In Table 1, we report the Bird's Eye View (BEV) results for the dense fog test split of STF [1]. We can see that, as expected, BEV performance for all methods and classes is slightly higher than in 3D, with the exception of a few cases where training with fog simulation again outperforms training with competing methods. In Table 2, we report the BEV results for the car class on dense fog, light fog and clear test sets.

  In Tables 3-8, we present results using the official KITTI [2] IoU stringency of 0.7 for car classes and 0.5 for cyclists and pedestrians. For all tables, we use the same snapshots as the quantitative results in the main paper, and all average precision (AP) and average precision (mAP) numbers reported here are again calculated using the 40 recall positions suggested in Simunelli et al. [7].

insert image description here

Figure 4: Camera references for the point clouds shown in Figures 5 and 6. On the right hand side we have open space, and on the left hand side we have vegetation.

References

[1] M. Bijelic, T. Gruber, F. Mannan, F. Kraus, W. Ritter, K. Dietmayer, and F. Heide. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[2] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[3] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[4] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[5] S. Shi, X. Wang, and H. Li. Pointrcnn: 3d object proposal generation and detection from point cloud. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[6] S. Shi, Z. Wang, J. Shi, X. Wang, and H. Li. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
[7] A. Simonelli, S. R. Bulo, L. Porzi, M. Lopez-Antequera, and P. Kontschieder. Disentangling monocular 3d object detection. In IEEE International Conference on Computer Vision (ICCV), 2019.
[8] Y. Yan, Y. Mao, and B. Li. Second: Sparsely embedded convolutional detection. Sensors, 18, 2018.

insert image description here

Table 1: Bird's Eye View (BEV) Average Precision (AP) results for STF [1] dense fog test segmentation. †Sunny weather baseline
*Fog simulation applied to each training example with α uniformly sampled from [0, 0.005, 0.01, 0.02, 0.03, 0.06]

insert image description here

Table 2: Results of all relevant STF [1] test splits for Automotive BEV [email protected]. †Sunny weather baseline
*Fog simulation applied to each training example with α uniformly sampled from [0, 0.005, 0.01, 0.02, 0.03, 0.06]

insert image description here

Table 3: 3D average precision (AP) results for STF [1] dense fog test segmentation. †Sunny weather baseline
*Fog simulation applied to each training example with α uniformly sampled from [0, 0.005, 0.01, 0.02, 0.03, 0.06]

insert image description here

Table 4: Results of all relevant STF [1] test splits for automotive [email protected]. †Sunny weather baseline
*Fog simulation applied to each training example with α uniformly sampled from [0, 0.005, 0.01, 0.02, 0.03, 0.06]

insert image description here

Table 5: BEV average precision (AP) results for STF [1] dense fog test division. †Sunny weather baseline
*Fog simulation applied to each training example with α uniformly sampled from [0, 0.005, 0.01, 0.02, 0.03, 0.06]

insert image description here

Table 6: Results of all relevant STF [1] test splits for Automotive BEV [email protected]. †Sunny weather baseline
*Fog simulation applied to each training example with α uniformly sampled from [0, 0.005, 0.01, 0.02, 0.03, 0.06]

insert image description here

Table 7: 2D Average Precision (AP) results for STF [1] dense fog test split. †Sunny weather baseline
*Fog simulation applied to each training example with α uniformly sampled from [0, 0.005, 0.01, 0.02, 0.03, 0.06]

insert image description here

Table 8: Results of all relevant STF [1] test splits for automotive 2D [email protected]. †Sunny weather baseline
*Fog simulation applied to each training example with α uniformly sampled from [0, 0.005, 0.01, 0.02, 0.03, 0.06]

insert image description here

Figure 5: Color coding by height (z value).

insert image description here

Figure 6: Color coding by intensity.

Guess you like

Origin blog.csdn.net/i6101206007/article/details/129592388