A review of research on 3D target detection methods

[Abstract]3D target detection is an important basic issue in application fields such as autonomous driving, virtual reality, and robotics. Its purpose is to extract the best description of the target from the disordered point cloud. Accurate 3D boxes, such as those that closely surround a pedestrian or vehicle point cloud, and give the position, size, and orientation of the target 3D box. Today, 3D target detection based on pure point clouds based on binocular vision, RGB-D cameras, and lidar, and 3D target detection based on multi-modal information from images and point clouds, are two main methods. First, the different representation forms and feature extraction methods of 3D point clouds are introduced, and then they are introduced layer by layer from three levels: traditional machine learning algorithms, non-fusion deep learning algorithms, and deep learning algorithms based on multi-modal fusion. Various 3D target detection methods analyze and compare methods within categories and between types, and conduct an in-depth analysis of the differences and connections between various methods. Finally, the remaining problems and possible research directions for 3D target detection are discussed. , and summarized the mainstream data sets and main evaluation indicators of 3D target detection research.

[Keywords]  Deep learning; 3D target detection; multi-modal fusion; point cloud; autonomous driving

0 Preface

In application fields such as autonomous driving, robots, and drones, 3D point clouds are often constructed through lidar, binocular vision, RGB-D cameras, etc. to describe the surrounding environment. However, point cloud information is disordered and lacks Semantic. In order to detect moving targets in point clouds, or to achieve target detection and human-computer interaction based on point clouds, the 3D point cloud frame that most accurately describes the target is extracted from the unordered point cloud frame, and the spatial position of the target 3D frame is given. , size and orientation. This process is called 3D target detection and is an important foundation for the above-mentioned types of applications.

Specifically, the 3D point cloud can be captured by visual sensors (including monocular, binocular, RGB-D) or radar sensors (including ultrasonic radar, lidar, millimeter wave radar), fused positioning sensors, inertial measurement units, etc., through processing The image or point cloud data is constructed by visual odometry or radar odometry methods. In a point cloud, any object whose surface

Guess you like

Origin blog.csdn.net/fzq0625/article/details/134916011