Summary of Semantic Segmentation Algorithms for Point Cloud Data (Traditional Methods + Methods Based on Deep Learning)

Table of contents

1 Traditional method

1.1 Segmentation based on edge information

1.2 Segmentation based on model fitting

1.3 Segmentation based on region growth

1.4 Attribute-Based Segmentation

1.5 Segmentation based on graph optimization

2 Deep Learning-Based Approaches

2.1 Projection-based segmentation

2.1.1 Multi-view representation

2.1.2 Spherical representation

2.2 Voxel-based segmentation

2.3 Point-based segmentation

2.3.1 Pointwise MLP method

2.3.2 Point convolution method

2.3.3 RNN-based method

2.3.4 Segmentation based on graph optimization

3 Summary


Compared with 2D data (image data), 3D point cloud data is mostly irregular, unstructured and disordered , retaining the original geometric information in 3D space, although the shape and size of the object can be obtained, but its space The segmentation of features such as location, geometric attributes, and substantive attributes also brings more problems. In addition, during the acquisition process, the rate of light and device angle changes causes the density distribution of the collected point data to be uneven, and point cloud semantic segmentation still faces a series of problems.

In the early days, many people tried to do research in the field of point cloud segmentation. These traditional segmentation methods have also achieved certain results, but most of them are limited by specific scenarios and prior knowledge, cannot be popularized and applied, and are time-consuming .

With the rapid development of deep learning in recent years , the research focus of point cloud semantic segmentation has also shifted to related methods headed by deep learning. Compared with the early traditional methods, these methods have greatly improved the accuracy of segmentation, especially after the idea of ​​directly processing data on the point cloud was proposed, more and more people began to prefer to directly process data on the point cloud . To make full use of the rich spatial information contained in the 3D point cloud.

1 Traditional method

Before deep learning is applied to the field of point cloud segmentation, there have been quite a few methods trying to segment point clouds. These traditional point cloud segmentation methods mainly rely on geometric constraints and statistical rules to artificially design the features of objects . Cloud data is divided into several groups of non-overlapping areas to correspond to each object in the scene . Although the effect is not ideal, there are still places for reference. These methods can be divided into the following four aspects.

1.1 Segmentation based on edge information

The edge-based segmentation method is to identify the edge information by identifying the brightness mutation point, so as to describe the shape of the object , and then group these edge information points to determine the final segmentation result .

This method has a faster segmentation speed, but the disadvantage is that the accuracy is relatively low, it is very sensitive to point clouds with uneven or sparse density, and it is also disturbed by noise.

1.2 Segmentation based on model fitting

The segmentation based on model fitting is based on the classification and geometric shape of point cloud data, and compares and matches the point cloud with known geometric figures ( such as cylinders, cones, spheres, etc.) and points with the same mathematical characteristics Classify into a class to segment known geometric shapes in the point cloud,

This method is mainly a segmentation method based on mathematical principles. Compared with the segmentation based on edge information, it is not only less disturbed by noise, but also has a faster calculation speed.

1.3 Segmentation based on region growth

The region-based segmentation method is to segment the point cloud region, and classify the point clouds with small differences into the same region according to a certain difference criterion . Specifically divided into seed and non-seed region methods.

The seed area first needs to select multiple seed points as the starting point, and according to the set growth rules, add neighborhood points with high feature similarity around the seed to make the neighborhood space grow and diffuse, and then use this neighborhood point as a new Repeat the above growth process for the seed point. The seed region segmentation is greatly affected by noise and takes a long time to calculate. In addition, the segmentation accuracy of this method is greatly affected by the selection of the initial seed point, so how to choose a suitable initial seed point is the key point and difficulty of this method.

The non-seed area is to classify all the points in the spatial domain into the same area first, and then further subdivide the area. Compared with the seed region, the disadvantage of non-seed region segmentation is that it is difficult to subdivide the location, and there is an excessive segmentation, and the accuracy of the segmentation requires higher prior knowledge.

1.4 Attribute-Based Segmentation

First calculate according to the point cloud attributes, cluster the attributes of the calculated points, define a feature vector for each point, and similar feature vectors will be classified into one category to complete the segmentation.

This method can better solve the influence of noise and outliers, but the disadvantage is that it requires high point cloud density and takes a long time to calculate.

1.5 Segmentation based on graph optimization

The graph-based segmentation method converts point cloud data into graph data by establishing the relationship between points, and then performs convolution calculation on this graph data, that is, selects a suitable graph convolutional neural network to perform representation learning on it .

The advantage of this method is that the graph convolution can aggregate the point set features of the object and maintain the translation invariance of its three-dimensional space, but how to properly establish the relationship between points is still a difficult problem to be solved.

2 Deep Learning-Based Approaches

With the development of deep learning, various fields of computer vision have become increasingly inseparable from deep learning. The technology of using deep learning to process 2D image data is very mature and has achieved good results. In recent years, more and more researchers have focused on processing point clouds with deep neural networks.

A two-dimensional digital image is composed of a matrix of pixels, which can be easily represented in a computer. However, three-dimensional point cloud data is composed of disordered points in space, which is difficult to be directly processed by a computer. Therefore, it is necessary to transform the point cloud into a regular structure suitable for convolutional neural network (Convolutional Neural Network, CNN) processing.

There are mainly the following methods: projection-based, voxel-based and point-based segmentation.

2.1 Projection-based segmentation

2.1.1 Multi-view representation

Early deep learning methods tried to project 3D point clouds onto a 2D plane , and then process the data based on the CNN network model . This method solves the problem that 3D point cloud data is difficult to process. It uses CNN to extract the features of planar projections, aggregates the features of multi-view planar projections, and obtains the results of semantic segmentation through fully connected layers and pooling layers.

Figure 2.1 Representative network structure based on multi-view representation

Fig.2.1 Representative network structure based on multi-view approach

Since the 3D point cloud is projected onto the 2D image, it will be affected by different viewpoint selections and projection angles, resulting in the loss of part of the available spatial collection information in the image, which will cause a decrease in segmentation accuracy to a certain extent, and this disadvantage is that this algorithm is difficult to avoid. of.

2.1.2 Spherical representation

The geometric shape of the point cloud scanned by the lidar is similar to a hollow cylinder. When the hollow cylinder is viewed from the direction perpendicular to the axis of the cylinder, it can be understood as a surrounding plane image, so that the spherical projection image can be used to represent the three-dimensional point cloud.

The prominent feature of this method is its fast speed. Compared with multi-view projection, spherical projection retains more point cloud information, but it still cannot solve the problem of occlusions in multi-view.

2.2 Voxel-based segmentation

Voxels (occupancy voxels) is a structured representation method, which divides the original point cloud data into voxels with a certain spatial size .

Figure 2.2 Representative network structure based on multi-view representation

Fig.2.2 Representative network structure based on voxel approach

Overall, the voxelized point cloud can better preserve the neighborhood structure of the original point cloud, and the structure of the voxelized representation also has good scalability and has a better segmentation effect. However, voxelization itself will bring problems such as discrete artifacts and information loss. Although point cloud voxelization transforms point clouds into regular data, colleagues who choose high resolution also bring problems of low computational efficiency and large memory usage. Problems make it difficult to choose a suitable grid resolution to meet the balance of all aspects.

2.3 Point-based segmentation

Since both projection-based and voxel-based methods have limitations such as loss of spatial information and degradation of structural resolution , a more efficient approach to point clouds is needed. The point-based segmentation method can not only make full use of the geometric structure information of the point cloud, but also improve the computational efficiency.

Current point-based segmentation methods are broadly classified into point-wise MLP methods, point convolution methods, RNN-based methods, and graph-based methods.

Fig.3 Representative network structure based on point cloud method

Fig.2.3 Representative network structure based on point

2.3.1 Pointwise MLP method

PointNet: Process directly on point cloud data, use a shared multi-layer perceptron (MLP) to extract the features of each point in the input point cloud data, and obtain global features through maximum pooling. Its core lies in the T-Net network. First, use the first T-Net to spatially align the point cloud construction transformation matrix to solve the problem of point cloud transformation invariance, and then use a T-Net to perform feature space alignment on the point cloud construction transformation matrix. align.

PointNet completes the classification and segmentation of point clouds through the global features of point clouds, but also ignores local features. In order to solve this problem, PointNet++ was proposed to learn features by layering points and grouping them in each layer. Allows the network to gradually learn point features from larger local regions. In addition, in order to solve the problem of uneven density in the point cloud, a multi-resolution combination algorithm is designed, which consists of two parts of vectors, one part is the vector obtained by feature extraction of all points in this part, which is the local global feature vector; the other part is to perform feature extraction on the subset, which is the local local feature vector. This algorithm improves the computing speed of the multi-scale combination algorithm. Subsequent improvements also mostly focus on how to learn richer contextual information and local structure associated with each point, mainly developed into the following methods, including adjacent feature pooling, attention-based aggregation, and local global feature concatenation. method.

Although PointNet does not pay attention to local feature information, it is difficult to apply to complex or uneven point cloud density scenes, but because of its pioneering ideas, it provides a good reference for the subsequent research on point cloud semantic segmentation.

2.3.2 Point convolution method

The advantage of the convolution operation is that it can extract the spatial information of the regular data well, but the inherent irregularity of the point cloud data itself prevents the ordinary convolution operation from being directly applied to the original point cloud data.

PointCNN : A kind of Xtransformation is designed to regularize the point cloud data, reweight and arrange the associated features of each point, retain the spatial position information of the point cloud, and then perform traditional convolution operations on the processed point cloud. PointCNN can take advantage of the spatial local correlation in the data densely represented in the form of a grid, so it has achieved better performance in point cloud segmentation and classification, but the direct convolution of the kernel on the features associated with these points will lead to partial shape The loss of information also has the problem that the calculation results are different due to the different order of the point cloud.

In addition, there is also a method to directly improve the traditional convolution operation. The kernel point convolution network KPConv proposed by Thomas et al. [15] uses the three-dimensional point in the point cloud space as the convolution center, and the coordinate points between The relative position information of the method uses the Euclidean distance to save the position information of the actual three-dimensional space through multiple convolution centers and assigning different weight values ​​to each point according to the distance. Through two different convolution centers, a rigid Rigid Kernel handles evenly distributed simple tasks, and a variable Deformable Kernel handles complex tasks with position changes. 

2.3.3 RNN-based method

The use of recurrent neural network (RNN) in point cloud semantic segmentation is mainly to obtain the inherent contextual features of the point cloud itself , and spatial context information is very important for the improvement of segmentation performance.

Ye et al. [17] proposed a new end-to-end method for semantic segmentation of unstructured point clouds, constructed an efficient pyramid pooling model to extract local information of 3D point clouds, and then extracted through a bidirectional RNN Spatial point cloud global dependencies. Two RNNs scan the 3D space in different directions to extract information, and use hierarchical sequential RNNs in two directions to fuse local information of different scales to obtain a wider range of context information, and finally achieve a good 3D semantic segmentation effect. But too much fusion of local features will lose the rich geometric features of the original point cloud.

2.3.4 Segmentation based on graph optimization

The graph-based segmentation method converts point cloud data into graph data by establishing the relationship between points, and then performs convolution calculation on the graph data, that is, selects a suitable graph convolutional neural network for representation learning.

The idea of ​​the graph-based method is to treat each point in the point cloud as a vertex of the graph, and form a directed edge of the graph with its domain points, so as to capture the underlying shape and geometry of the point cloud.

The advantage of this method is that the graph convolution can aggregate the point set features of the object and maintain the translation invariance of its three-dimensional space, but how to properly establish the relationship between points is still a difficult problem to be solved.

3  Summary

Compared with ordinary images, point cloud data has the characteristics of sparsity, irregularity, and disorder , and has higher requirements for algorithm efficiency and memory usage. Traditional algorithms are difficult to process and model 3D point cloud data. . Compared with traditional methods, the feature extraction of point cloud data based on deep learning can be applied to more scenes, and the segmentation effect is better. The 3D point cloud classification and segmentation algorithm based on graph convolutional neural network has also been recognized by more and more people. attention and research.

Point-based network is currently the most commonly used research method, and some joint methods of point-voxel or other representations also show good segmentation performance. The fusion of multiple methods brings more possibilities to the field of point cloud segmentation. At present, some work has tried to combine the advantages of different deep learning methods, but has not yet achieved good results. Therefore, in the future, the fusion between different methods is still the difficulty and focus of research on semantic segmentation of point cloud data.

references

  1. Y. H. Qu, Q. Pan, J.G. Yan. Flight path planning of UAV based on heuristically search agenetic algorithmns Industrial Electronics Society, 2005C]. IECON 2005. 31st Annt Conference of IEEE.IEEE,NC,USA,2005.
  2. F. J. Lawin, M. Danelljan, P. Tosteberg, G. Bhat, F. S.landM.Felsberg,“Deep projective 3D semantic segmtion,”inCAIP,2017.
  3. Chen X,Ma H,Wan J,et al. Multi-view 3d object detection nProceedings of the IEEE conference on Computer Vision and Pattern Recognition.2017:1907-1915
  4. Isacson D,Smedh K,Nikberg M, et al. Long - term follow - up of the AVOD ramdomized trial of antibiotic avoidance in uncomplicated diverticulitis[J]. British  Journal of Surgery,2019,106(11):1542-1548
  5. Boulch A, Guery J, Le Saux B, et al SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks[J].Computers & Graphics,2018,71:189-198.
  6. Wu, A. Wan, X. Yue, and K. Keutzer,“SqueezeSeg: Convolutional neural nets with recurrent CRF for real-time road object segmentation from 3D LiDAR  point cloud,”in Proc.IEEE Int. Conf. Robot.Autom.,2018,pp.1887-1893.
  7. Milioto, I. Vizzo, J. Behley, and C. Stachniss“RangeNet+ +:Fast and accurate LiDAR semantic segmentation,”in Proc. IEEE/ RSJInt. Conf.Intell.Robots Syst.,2019, pp. 4213-4220.
  8. J. Huang and S. You, “Point cloud labeling using 3D convolutional neural network,inICPR,2016.
  9. Liu B,Wang M,Foroosh H,et al Sparse convolutional neural networks[C]. Proceedings of the IEEE conference on computer vision and pattern recognition.2015:806-814
  10. Klokov R, Lempitsky V. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models[C]. Proceedings of the IEEE International Conference on Computer Vision.2017:863-872.
  11. Riegler G,Osman Ulusoy A,Geiger A. Octnet: Learning deep 3d representation at high resolutions[C]. Proceedings of the IEEE conference on computer vision and pattern recognition 2017:3577-3586.
  12. Qi C R, Su H, Mo K, et al. Pointnet: Deep learning on point sets for 3d classification and segmentation[C] //Proceedings of the IEEE conference on computer vision and pattern recognition.2017:652-660
  13. Qi C R, Yi L, Su H, et al. Pointnet+ +: Deep hierarchical feature learning on point sets in a metric space. Proceedings of the Advances in Neural Information Processing Systems[C]2017: 5099-5108.
  14. Li, Yangyan, et al. "Pointcnn: Convolution on x-transformed poin-ts."Advances in neural information processing systems31 (2018).
  15. Thomas, Hugues, et al. "Kpconv: Flexible and deformableconvolution for point clouds."? Proceedings of the IEEE/CVF international conference on computer vision. 2019.
  16. F.Engelmann,T.Kontogianni,A.Hermans,and B.Leibe,“Exploring spatial context for 3D semantic segmentation of point clouds,”in Proc. IEEE/CVF Int.Conf.Comput.Vis.,2017,pp. 716-724.
  17. L. Landrieu and M.Simonovsky, “Large-scale point cloud sem-antic segmentation with superpoint graphs,” in Proc. IEEE/ CVFConf. Comput. Vis. Pattern Recognit, 2018,pp.4558-4567
  18. L. Landrieu and M. Boussaha, Point cloud oversegmentationwith graph-structured deep metric learning,”in Proc. IEEE/CVFConf. Comput. Vis. Pattern Recognit., 2019, pp. 7432-7441.

Guess you like

Origin blog.csdn.net/Mluoo/article/details/128359366