Classic Literature Reading--Poisson Surface Reconstruction for LiDAR Odometry and Mapping

0. Summary

Accurate localization and mapping of the environment are fundamental components of most autonomous driving systems. In this paper, we propose a new method for LiDAR odometry and mapping, focusing on improving the quality of mapping while estimating the pose of the vehicle. Our method performs frame-to-mesh ICP, but in contrast to other SLAM methods, we represent the map as a triangular mesh computed by Poisson surface reconstruction. We perform surface reconstruction in a sliding window fashion over a series of past scans. In this way, we obtain accurate local maps that are well suited for registration and can also be combined into global maps. This allows us to build a 3D map that shows much more geometric detail than common mapping methods that rely on truncated signed distance functions or surfaces. We evaluate quantitatively and qualitatively that our map provides higher geometric accuracy than other maps. And our map is compact** and can be used for LiDAR odometry estimation** based on ray-casting-based data association.

1 Introduction

Most autonomous systems cannot navigate effectively without a map of the environment and knowledge about their pose. Therefore, localization, mapping, and simultaneous localization and mapping (SLAM) [4], [34] are important components of autonomous systems.

We use a rotating LiDAR sensor to reconstruct the environment and investigate the use of an alternative scene reconstruction for mapping and registration. Scene reconstruction is critical as it is used to register incoming scans. For accurate relative pose estimation and convincing mapping results, scene reconstruction must capture and represent the environment with a high level of detail.

The goal of this paper is to improve the geometric accuracy of LiDAR-based mapping while estimating the time-varying pose of a vehicle with low drift. We achieve this by using a triangular mesh representation computed by a Poisson surface reconstruction technique [15]. This is in contrast to other state-of-the-art methods, which typically use surfels [1] or truncated signed distance functions [7], [23] as expressions, often providing rather low reconstruction quality, at least for large outdoor scenes . With our approach, we reconstruct meshes from LiDAR data of the robot's outdoor environment at a quality previously only available at the object level, indoor scenes and using ground-based scanners, or by aggregating multiple passes of the same scene.

The main contribution of this paper is a novel LiDAR odometry and mapping system based on surface reconstruction methods that provide accurate geometric maps. We aggregate individual scans into a local point cloud and use these to reconstruct a triangular mesh of the scene. Our experimental evaluation shows that this triangular mesh is well suited for the registration of 3D LiDAR scans, as it is relatively compact, preserves fairly detailed structures, and allows accurate frame-to-mesh registration. This leads to a new 3D LiDAR-based mapping method that provides accurate geometric maps and can be used for pose estimation, as shown in Figure 1. We show that the proposed map representation (i) is an exact geometric representation of the environment; (ii) is more memory efficient than other map representations, and (iii) allows the use of frame-to-mesh based registration algorithms The precise registration of the input scans. We support our algorithm with experimental evaluation on synthetic and real-world data.

2. Related work

SLAM has been studied for decades, here we focus on 3D LiDAR based methods.

Laser-based SLAM systems either rely on sparse features [39], [22] or use dense map representations [1], [8] for registration. Deschaud [8] proposes to use Implicit Moving Least Squares (IMLS) surfaces to represent the mapping. Behley [1] uses a surf-based representation for mapping. In contrast to these methods, we register LiDAR scans into a dense map using triangular meshes instead of curved surfaces or IMLS surfaces.

A common technique to obtain triangular meshes from point clouds is 3D surface reconstruction [2]. Traditional approaches determine the implicit functions that model the underlying surfaces, for example, using tangent planes [12], radial basis functions [5], truncated signed distance functions (TSDF) [7] or polynomial representations [17]. Poisson surface reconstruction [14], [15] provides accurate geometric reconstruction based on this principle.

In contrast to feature-based SLAM systems, the goal of dense schemes is to use all input data and aggregate them into a dense map. A popular approach is to use the aforementioned TSDFs popularized by Newcombe et al. [23], but map for RGB-D. Most TSDFs methods require the volume of the environment to be known in advance since they rely on a fixed voxel grid, but Welan et al. [37] propose to use rolling grids to alleviate this limitation and store intermediate results in a triangular grid middle. Other approaches use octrees [35] or allocate blocks on demand [24][26]. In contrast to these methods, we do not assume the size of the environment or use any optimized data structures to build the map.

In robotics, Marton et al. [19] adopted a triangular mesh representation for reconstruction. Such representations have also been explored in recent years by visual-inertial systems [30], LiDAR methods [6], [31] and pure visual systems [27]. In contrast to our work, these methods typically compute a sparse reconstruction of the traversed environment, whereas our goal is to reconstruct a continuous triangular surface, capturing geometric details.

Furthermore, 3D lidar-based methods use variants of TSDF to achieve larger volumes and build globally consistent maps [21], [25], [28], [29]. [18] utilized TSDFs to reconstruct large outdoor environments, but they required multiple passes over the same scene to obtain convincing reconstruction results. We propose a method that achieves a high level of detail in only one pass.

3. Our solution

We perform the following three steps for each scan: first, we compute per-point normals, second, we register the scans to a local map, and third, we fuse the registered scans into a global map. We further propose a novel framework frame-to-mesh registration strategy, utilizing triangular meshes for mapping.

Our method distinguishes between local and global maps. The local map is used to perform odometry estimation and is constructed from the last N local scans. The global map is an aggregated mesh of the entire environment.

A. Normal Computation and Point Cloud Registration

To compute normals, we project the point cloud into a range image using a spherical projection, and use cross products from neighboring pixels [1] to estimate normal vectors. While this is sometimes less accurate than estimating normals via principal component analysis of the covariance of a neighborhood of points, it is much more efficient because it does not need to determine a neighborhood of points.

For point cloud registration, we iteratively perform data association of the point cloud to the triangle mesh and determine pose increments to minimize the error metric.

For data association between point clouds, the nearest point association found by neighbor search or projection [32] is a common choice. We could also use this strategy by searching mesh representations of neighbors on triangle vertices, but this is suboptimal, as we will show in the experimental evaluation.

We recommend using ray-casting to determine ray-triangle intersections. For each intersection point, we extract the point and associated normal of the intersecting triangle. To do this, we first apply the last estimated pose, Tt-1ϵ ℝ4X4, to the current scan as an initial alignment. Then we create a set of n rays R={ri}. Each ray ri is defined by oi +τdi, with origin oi = tkt at the current estimated sensor position and direction di = Tktpi, passing through all points pi of the current scan. Here, Tkt is the estimated pose at iteration k, and tktϵ ℝ4 is the translation part of Tkt.

The intersection of each ray riϵ ℝ with the grid corresponds to a point pi, denoted as qi, and the normal of the intersecting triangle is the corresponding positive line ni. To compute the relative transformation between scan and grid, we can now use different error metrics E(.,.) such as point-to-point, point-to-plane or plane-to-plane error [3], [32], [33].

The data association step can also lead to false correspondences, where a given point from a surface is associated with an intersection point in a mesh from another surface. This usually happens when the ray doesn't hit anything close to the surface, and hits a triangle far away. Therefore, as outlier rejection, we remove the corresponding (pi,qi) ϵ C from the set of correspondences C satisfying ||pi-qi||>σd. In our implementation, we used σd = 1 m.

We optimize the following objectives:

In our experiments, we found that the best performing metric is the point-to-plane metric [32]. To reduce the influence of outliers not filtered by outlier rejection, we use the Huber weighting kernel [13].

The main advantage of the proposed data association is that it does not require the computation of nearest neighbors, instead the association step utilizes a graph representation via a ray-cast mesh, which can be performed much faster, especially when dealing with high-resolution grid time. Nevertheless, this approach does not handle large rotational motions correctly and requires a good initial estimate to converge, even though this applies to nearest neighbor methods as well.

B. Meshing Algorithm

A common technique for 3D surface reconstruction using point sets is to construct an implicit function that aims to recover the underlying surface of the input data [12]. This implicit function f is usually defined in ℝ3 as a scalar field, namely f: ℝ4 → ℝ, where the zero set of f represents the surface we want to model. A popular technique in robotics and SLAM is to approximate f, the projected distance from the sensor to the surface, with a signed distance function [7].

In contrast, our work explores the use of Poisson surface reconstruction [14], [15] to construct consistent, smooth, high-quality maps for mobile robots, especially autonomous vehicles. For the details of the reconstruction algorithm, we refer to the original publications [14], [15]. Our aim is to investigate the use of triangular meshes in addition to the algorithmic choice for reconstruction.

Next, we explain mesh postprocessing. The Poisson reconstruction described above is designed to recover the closed surface of a single object in three dimensions, as shown in Figure 2.

Our 3D world, especially the external environment, is not composed of closed surfaces. Therefore, we need to refine the reconstructed surface and perform a post-processing step, which includes removing low-density vertices. The density σ(v) of a vertex v on the mesh measures how many points in the input point cloud support the vertex v. Intuitively, a low value means that the vertices are supported by only a small number of points, and therefore, there are no dense measurements or no measurements at all in the raw LiDAR scan (since the Poisson surface reconstruction algorithm also extrapolates points with no data). After rebuilding the mesh, we compute the distribution of each vertex density, as shown in the histogram in Figure 2, right side of the legend. Vertices of interest with a high density, i.e. those that are spatially closer to the point cloud data, are colored yellow to red in the figure. We prune away low-density vertices regardless of the size of the mesh triangles. We make this decision based solely on the density σ(v) of the vertices. We consider the density accumulation histogram starting from the highest density value and prune those vertices that belong to the last 10%. This means we prune off 10% of the vertices with the smallest density value.

This post-processing produces a tighter reconstruction of the input data showing few artifacts, which allows us to progressively build a global mesh as described in Sec. III C. Note that without this step, it is impossible to construct a global map of the environment due to the artifacts shown in blue in Fig. 2.

Note that, as an interesting side effect, this density-based filtering also tends to eliminate most moving objects in the scene, since 3D points on the surface of moving objects usually only support a small number of triangles as the surface changes with each scan. This results in a low density of triangles on moving objects, so the surface is not reconstructed at those locations.

C. Local and Global Map

In our approach, we distinguish between local and global maps. The local map is built from the last N aggregation scans. The global map is only used for visualization and the final output of the report, but it is not used in our method, this will change if we add loop closure detection. Every time a new LiDAR frame is registered to the local map, a new mesh is constructed from the local map. This results in a rolling grid-like mesh that moves with the estimated pose of the vehicle and stores enough information to register new incoming scans. During the first N scans, we disable the mesh reconstruction module and rely on standard point-to-plane ICP to estimate the vehicle's pose. After the M scans are registered, the resulting local grid is integrated into the global grid map. This means that the global grid is updated only after M scans arrive and are registered (as opposed to the local grid which is updated each time a new scan arrives). To do this, we add all triangles in the local mesh to the global mesh, and then remove duplicate triangles that may occur due to overlap in the local map area. In our implementation we use N = M = 30.

4. Experimental evaluation

Our algorithm runs entirely on CPU and was tested on an Intel Xeon W-2145 with 8 cores @3.70GHz and 32GB of RAM. For all our experiments, we used the default SuMa setting and a voxel size of 0.10 m for the TSDF. For our method, we set the depth of the octatree used in Poisson surface reconstruction [14] to ▲tree=10. For real-world experiments, we use the dataset from kitti.

For our assessment:

1) Use the following indicators:

Using the above evaluation indicators, the comparative analysis results with TSDF and Surfel:

Figure 3 Qualitative examples of map accuracy. The first row shows three map representations, namely TSDF, Surfel and our method. The second row describes the dense GT point cloud used to compute the metrics in Tab I. Together with the GT point cloud, we highlight in yellow the points in the GT model whose distance from the closest point in the constructed model shown in the first row is greater than σd = 3 cm. Intuitively, the more yellow points there are, the more errors or gaps the model contains.

2) Memory consumption: Figure 4 shows the memory consumption over time for two different sequences of the KITTI odometry benchmark by different methods. We see that our method scales well with the number of input scans, whereas surfel-based or point cloud maps require more memory, making running on mobile platforms infeasible. TSDF-based mapping exhibits similar memory consumption, but with reduced mapping accuracy, as shown in Sec IV-B.

Figure 4 Memory consumption of different map representations for two KITTI sequences. On the left, we show a graph of the triangle mesh constructed by our method and its final dimensions. Sequence 04 is a rural setting, and Sequence 07 is recorded in an urban setting. We see that point clouds and surfels lead to higher memory usage. TSDF performs similarly to our method in terms of memory usage, but as shown in Figure 3, our method outperforms TSDF in terms of geometric accuracy.

3) Odometer and positioning accuracy. Regarding estimated performance, we can see it in Tab II. Our method provides solid pose estimation performance when comparing the pose provided by our system with the ground truth pose provided by KITTI.

Mileage estimation results for the Tab II KITTI Odometry enchmark [10]. Rows highlighted in gray correspond to translation errors, and rows below to rotation errors. All errors are averaged over trajectories ranging in length from 100 to 800m. Translational error is in % and relative rotational error is in degrees per 100 m. DA for data association, RC for ray casting, and NN for nearest neighbor. Bold numbers indicate the best method for a given sequence.

4) Registration algorithm. To investigate the accuracy of the registration method, we compute the average translation and rotation errors over the entire training sequence of the dataset. We see that the proposed registration algorithm scales better when the size of the input grid increases. The results of this experiment are shown in Table III.

Evaluation of ray-casting and mesh-sampling registration of Tab III on the complete kitti [11] training sequence. Relative errors are averaged over trajectories of length 100 to 800 m. Relative translation error (terr) in % and relative rotation error (rerr) in degrees. The runtime value is expressed in milliseconds.

5) Running time. Preprocessing and normal estimation take an average of 45 ms per scan, and the scan matching algorithm takes an additional 500 ms. However, the bottleneck is the meshing algorithm which takes an average of 5 seconds to execute on the CPU. This makes our method infeasible for online operation of autonomous vehicles. In this work, we mainly focus on the use of triangular meshes in developing SLAM systems and show that both reconstruction quality and pose estimation accuracy are promising. In future work, we need to study techniques for optimizing meshing algorithms for online performance, for example, running reconstruction algorithms on GPUs, in addition, Uenabled ray-tracing engines such as NVIDIA OptiX™ can be used to accelerate ray-casting quasi-algorithm.

5 Conclusion

In this paper, we propose a new method for simultaneous odometry estimation and mapping using 3D LiDAR. Our method performs a novel frame-to-mesh registration, but in contrast to other SLAM or odometry and mapping methods, we represent the map as a triangular mesh, reconstructed using a Poisson surface in a sliding window in Estimated on past scan sequences. We show that we obtain high-quality local meshes that display more detail than common alternatives such as state-of-the-art TSDF or surfel representations. We also show that our map representation is well suited for incremental scan registration for pose estimation. For future work, our system can be extended by combining loop closure detection [38] and a pose graph optimization framework for efficient dense map correction.

Guess you like

Origin blog.csdn.net/lovely_yoshino/article/details/131750500