Paper Interpretation | Center-based 3D Object Detection and Tracking

Original | Text by BFT Robot

picture

CenterPoint differs from traditional box-based 3D object detectors and trackers in that it represents, detects, and tracks 3D objects as points instead of using bounding boxes. This approach has several advantages, including reducing the search space of the object detector, simplifying downstream tasks such as tracking, and making it possible to design efficient two-stage refinement modules that are much faster than previous approaches. Additionally, CenterPoint allows the backbone network to learn rotational invariance of objects and rotational equivariance of their relative rotations. Detection is a simple refined local peak extraction and tracking is a nearest distance matching.

picture

Center-based framework for detecting and tracking objects

This paper mainly describes a new framework called CenterPoint, which represents, detects and tracks 3D objects as points instead of using bounding boxes. The method simplifies the detection and tracking process and achieves state-of-the-art performance on benchmark datasets. The authors of this paper also introduce a new center point detection head, but relies on existing 3D backbone networks (such as VoxelNet or PointPillars).

Our method overcomes the challenges of traditional box-based detectors and simplifies 3D object tracking. In the preliminary work, we first review the current state-of-the-art methods in the field of 3D object detection and tracking, including box-based, anchor-based and point-based methods. We also discuss popular datasets and evaluation metrics used in this field. We then introduce the main ideas and contributions of our approach and describe the design and implementation of our CenterPoint framework in detail. Finally, we conduct experiments on several benchmark datasets and demonstrate the superior performance of our method.

CenterPoint first detects the center of the object using a keypoint detector and falls back on other properties including 3D size, 3D orientation, and speed. In the second stage, it refines these estimates using additional point features on the object. The resulting detection and tracking algorithms are simple, efficient, and effective. CenterPoint achieved state-of-the-art 3D detection and tracking performance on the nuScenes benchmark, with a single-model NDS of 65.5 and AMOTA of 63.8. On the Waymo Open Dataset, CenterPoint significantly outperforms all previous single-model methods and ranks among the top of all lidar-only methods.

An overview of the center point framework

The article describes the use of a standard lidar-based backbone network such as VoxelNet or PointPillars to build a representation of the input point cloud. It then flattens this representation into a top view and uses a standard image-based keypoint detector to find the center of the object. For each detected center, it regresses from the point features of the center location to all other object properties such as 3D size, orientation and velocity. Furthermore, we use a lightweight second stage to optimize object positions. This second stage extracts point features at the 3D center of each face of the estimated 3D bounding box of the object. It recovers the local geometric information lost due to stride length and limited receptive field of view, and brings small cost and considerable performance improvement.

01

Experimental results

This article first demonstrates our 3D detection results on the test sets of Waymo and nuScenes. Both results use a single center point-voxel model. On the Waymo test set, our model achieved 71.8 level 2 mAPH for vehicle detection and 66.4 level 2 mAPH for pedestrian detection, exceeding the previous method to achieve 7.1% mAPH for vehicles and 10.6% mAPH for pedestrians. On nuScenes, our model outperforms last year's challenge winner CBGS on multi-scale input and multi-model ensemble by 5.2% mAP and 2.2% NDS.

picture

Our model significantly outperforms all other submissions under the Neural Plane Metric (PKL), a hidden metric evaluated by the organizers after submitting the leaderboard. This highlights the generalization ability of our framework.

For 3D tracking, center point tracking performance on the Waymo test set, our tracking does not require a separate motion model, and the running time is negligible, 1ms above detection.

In the two-stage center point model in our paper, only the features in the two-dimensional CNN feature map are used. However, previous methods also proposed using voxel features for the second stage of refinement.

picture

Qualitative results of Waymo verification by central points

02

in conclusion

This paper proposes a center-based framework for both 3D object detection and tracking LiDAR point clouds. Basically we use a standard 3D point cloud encoder with a few convolutional layers in the head to produce a bird view heatmap and other dense regression outputs. Centerpoint is simple, near real-time, and has achieved state-of-the-art performance through multiple tests on Waymo and nuScenes benchmarks.

Author | Zhang Zheyu

Typesetting | Xiaohe

Review | Orange

If you have any questions about the content of this article, please contact us and we will respond promptly. If you want to know more cutting-edge information, remember to like and follow~

Guess you like

Origin blog.csdn.net/Hinyeung2021/article/details/132762260