Discipline cutting-edge homework: 3D point cloud deep learning

3D Point Cloud Deep Learning

    Abstract : With the continuous development of 3D cameras, and the improvement of computer vision, automatic driving and other technologies, the application of 3D data is becoming more and more urgent. In recent years, deep learning has developed vigorously, and there are more and more researches based on 3D point clouds. As a kind of geometric space information that can express three-dimensional objects richly, point cloud has great research value. This article will introduce object recognition based on 3D point clouds, deep learning methods for semantic segmentation and instance segmentation, as well as current problems and possible future development directions.
    Key words : point cloud; deep learning; computer vision

    As a kind of data that can accurately describe the position, shape, and size of complex objects, 3D point cloud is widely used in the fields of robotics, computer vision, and automatic driving based on its simple storage, strong description ability, and simple processing. Deep learning has shined in the field of artificial intelligence in recent years, and a hundred schools of thought contend. At present, deep learning has achieved remarkable results in the application of two-dimensional images. With the development of 3D cameras and the emergence of some important 3D datasets, the research boom of deep learning in 3D space has been further promoted, and more and more methods have been applied to various tasks in 3D space. However, due to the complexity of three-dimensional space representation, deep learning still faces many challenges in three-dimensional applications. This article will discuss deep learning in 3D point cloud object recognition [1], semantic segmentation, instance segmentation [2] and other tasks, and analyze some of the latest research and future development directions of point cloud in various task scenarios.

1 Object Recognition Based on 3D Point Cloud
    1.1 Feature Extraction
    Feature extraction is mainly to mine the features related to the target task from the 3D point cloud data. The eigenvalues ​​of the samples vary greatly. The common classification methods have the following four categories: (1) the method of dividing according to the artificially specified block shape with the point cloud block as the unit; (2) a certain feature division based on the point cloud neighborhood with a single point as the unit Methods. For example, Shen Y et al. [3] proposed to use the neighborhood information of the point cloud to construct the K-nearest neighbor map for kernel correlation construction features. (3) A clustering-based method based on a single object. For example, the density-based clustering algorithm proposed by Tong Guofeng et al. [4] in 2018 to segment point cloud data. (4) Using the feature extraction method of neural network. Such as the use of CNN network, convolution filter, multi-layer perceptron and other methods, among which the well-known methods are PointNet[5] and PointNet++[6] algorithms.
    1.2 Feature selection
    The features of the extracted 3D point cloud may have a large number of features that are less relevant to the target task. How to select the most useful features from among the many features is the main task of feature extraction. Feature selection mainly solves the problems of bit dimension disaster, noise influence, and overfitting. Its main methods can be divided into three categories: (1) principal component analysis method. The principal component analysis method mainly maps n-dimensional features into k-dimensional mutually orthogonal features using a series of methods. For example, the principal component analysis of point cloud coordinates was done by Zhang Rui et al. [7] in 2014. (2) The method based on integration can be divided into two types: Boosting-based method [8] and Bagging-based method [9]. (3) The method based on neural network. The method of neural network is mainly to extract and select features by abstracting features without human intervention by means of learning connection weights and thresholds between neurons. PointNet[5] and PointNet++[6] are good examples.
    1.3 Feature recognition
    The feature recognition of the 3D point cloud is mainly to classify the features and assign the predicted label category of the object represented by the point cloud block. In deep learning, it is mainly used to extract features, and the classification of features is generally handled by the final fully connected network, and finally the score of the predicted category is output through Softmax.

2 Semantic Segmentation Based on 3D Point Cloud
    2.1 The Direct Method of Semantic Segmentation
    Semantic segmentation of 3D point cloud is to recognize different categories of 3D objects and give different category labels. The direct method of point cloud semantic segmentation can also be divided into (1) the method based on multi-layer perceptron, such as the scale-invariant transformation algorithm proposed by Jiang et al. [10], the K-means method proposed by Francis et al. Clustering algorithm and k-NN algorithm, etc. (2) Convolution-based methods, such as PointCNN proposed by Li et al. [12], point-level convolution operator proposed by Hua et al. [13], and point cloud-based expansion volume proposed by Hughes et al. [14] product network structure, etc. (3) Recurrent neural network-based methods, such as Liu et al. [15] proposed an efficient semantic parsing network that can be applied to large-scale point clouds. (4) Graph-based methods, such as the graph-based embedding module and pyramidal attention network module proposed by Kang et al. [16].
    2.2 Voxel-based method The
    voxel-based semantic segmentation method of 3D point cloud mainly converts the point cloud data into voxel data first, and performs semantic segmentation according to the obtained voxel data. Using voxel information to represent three-dimensional objects can well preserve the neighborhood information of three-dimensional objects, and can refer to two-dimensional convolution operations to realize convolution operations on voxels and improve processing accuracy. However, the voxel-based method has the following three disadvantages (1) information loss is inevitable during the process of converting point clouds into voxels; (2) voxel-based methods usually have relatively high time complexity and space complexity ; (3) It is difficult to choose an appropriate voxel resolution for a specific task.
    2.3 Multi-view based approach
    The semantic segmentation method of 3D point cloud based on multi-view is mainly to render the 3D object of the point cloud from multiple different angles to generate multiple 2D views, and then use the deep learning method to perform semantic segmentation tasks on the generated 2D views. Therefore, the multi-view point cloud semantic segmentation method can not directly perform convolution operations on complex 3D data, but use relatively mature 2D convolution operations for processing, making full use of the existing 2D convolution network development The advantage is that there is no need to design a complex network architecture. This method has the following two disadvantages: (1) The performance of the network is very sensitive to the angle and occlusion of the view. (2) When converting between point clouds and views, information loss is inevitable.
    2.4 Method based on hybrid representation The
    semantic segmentation method of 3D point cloud based on multi-hybrid representation is mainly to combine the above different methods to process 3D data together, which can make full use of the advantages of various methods, learn from each other, and obtain the best segmentation accuracy.

3 Instance Segmentation Based on 3D Point Cloud
    The task of strength segmentation is more difficult than the task of semantic segmentation. Strength segmentation not only needs to identify 3D objects with different semantics, but also recognizes 3D objects with different instances under the same semantics. Strength segmentation is usually a follow-up task to semantic segmentation.
    3.1 The method based on the candidate region
    The 3D point cloud strength segmentation method based on the candidate region is divided into two steps to complete the strength segmentation task. First, the region of interest in the scene is obtained to give a candidate frame, and then the instance is predicted from the candidate frame. This method is simple and straightforward to implement, but because it requires two steps to implement the segmentation task, it is necessary to remove some candidate regions that were predicted incorrectly in the first stage. Usually, the time complexity and space complexity are relatively high, and the hardware requirements are also high. .
    3.2 Candidate-free method
    The strength segmentation method of candidate-free 3D point cloud does not need to be divided into two steps for instance segmentation, and the instance object is directly given by point cloud features or combined with semantic information. One of the ideas is to first use a convolutional network such as PointNet to extract feature information of points, and then aggregate points with similar features into instance objects, such as the SGPN network proposed by Wang Weiyue et al. [17]. Another idea is to couple semantic segmentation and strength segmentation into one task for processing. Such as the ASIS network proposed in [18]. The strength segmentation method without candidate regions requires less computing resources than the strength segmentation method based on candidate regions, but the classification accuracy is lower.

4 Summary
    3D point cloud is an important data type in the fields of autonomous driving and robotics. Compared with 2D image data, it can better preserve the geometric relationship of objects and has unique advantages in facing occlusion. However, there is no such thing as point cloud The interrelationship information between points is saved, and the amount of point cloud data is generally large, so there are still many challenges in the processing of point clouds. After the research of various task methods for point cloud in this paper, it is found that the following two aspects should be further explored in the future:
    (1) The disorder of point cloud data. Since the relationship information between points in point cloud data is difficult to express, it is necessary to design a more complex model to process it. The time complexity and space complexity of the algorithm, as well as the hardware requirements are very high. Future research can design a faster and more accurate network model to solve various tasks according to the characteristics of point cloud data.
    (2) At present, the network for point cloud is based on small-scale point cloud data, but in general, the point cloud we get is large-scale and rich in information, so it is necessary to study more effective processing in future research. Networks for large-scale point cloud data.

References:
[1] Xie Zexiao, Li Meihui. A review of machine learning in the field of point cloud-based 3D object recognition [J]. Journal of Ocean University of China (Natural Science Edition), 2021, 51(06): 125-130 .
[2] Gu Junhua, Li Wei, Dong Yongfeng. A Review of Segmentation Methods Based on Point Cloud Data [J]. Journal of Yanshan University, 2020, 44(02): 125-137. [3]Shen Y, Feng C, Yang
Y , et al. Mining point cloud local structures by kernel correlation and graph pooling[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2018: 4548-4557. [4] Tong
Guofeng , Du Xiance, Li Yong, Chen Huairong, Zhang Qingchun. Three-dimensional point cloud classification of outdoor large scenes based on slice sampling and centroid distance histogram features [J]. China Laser, 2018, 45(10): 156-164. [5
] Qi CR, Su H, Mo K, et al. Pointnet: Deep learning on point sets for 3D classification and segmentation[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2017: 652-660.
[6]Qi CR, Yi L, Su H, et al. PointNet++: Deep hierarchical jeature learning on point sets in a metric space[J]. Advances in Neural Information Processing Systems, 2017, 30. [7] Zhang Rui, Li
Guangyun , Li Minglei, Wang Shiyan. Research on Laser Point Cloud Classification Method Using PCA-BP Algorithm[J]. Surveying and Mapping Bulletin, 2014(07): 23-26. [8]Schapire R E.
The strength of weak learnability[J]. Machine Learning, 1990, 5(2): 197-227.
[9]Breiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[10]Jiang M, Wu Y, Zhao T, et al. Pointsift: A sift-like network module for 3d point cloud semantic segmentation[J]. ArXiv Peprint ArXiv: 1807.00652, 2018. [11]
Engelmann F, Kontogianni T, Schult J, et al. do: 3D semantic segmentation of point clouds[C]. Proceedings of the European Conference on Computer Vision Workshops. 2018: 0-0.
[12]Li Y, Bu R, Sun M, et al. Pointcnn: Convolution on x-transformed points[J]. Advances in Neural Information Processing Systems, 2018, 31: 820-830.
[13]Hua B S, Tran M K,Yeung S K. Pointwise convolutional neural networks[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2018: 984-993.
[14]Thomas H, Qi C R, Deschaud J E, et al. Kpconv: Flexible and deformable convolution for point clouds[C]. Proceedings of the IEEE International Conference on Computer Vision. 2019: 6411-6420.
[15]Liu F, Li S, Zhang L, et al. 3DCNN-DQN-RNN: A deep reinforcement learning framework for semantic parsing of large-scale 3D point clouds[C]. Proceedings of the IEEE International Conference on Computer Vision. 2017: 5678-5687.
[16]Zhiheng K, Ning L. PyramNet: Point cloud pyramid attention network and graph embedding module for classification and segmentation[J]. ArXiv Preprint ArXiv: 1906.03299, 2019.
[17]Wang W, Yu R, Huang Q, et al. Sgpn: Similarity group proposal network for 3d point cloud instance segmentation[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2018: 2569-2578.
[18]Wang X, Liu S, Shen X, et al. Associatively segmenting instances and semantics in point clouds[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2019: 4096-4105.

Guess you like

Origin blog.csdn.net/qq_37428140/article/details/122557165