Interpretation of the paper|2017 CVPRPointNet: Deep Learning of Point Sets for 3D Classification and Segmentation

Original | Wen BFT robot 

picture

01

background

Traditional convolutional architectures require regular input data formats, such as image grids or 3D voxels, for weight sharing and other kernel optimizations.

Because point clouds or grids are not in a regular format, researchers typically convert this data into a regular 3D voxel grid or collection of images before feeding into a deep network structure.

However, such data representation transformations lead to unnecessary data redundancy and introduce quantization errors, which may mask the natural invariance of the data.

PointNet provides a new deep network architecture suitable for processing 3D unordered point sets, avoids the need to convert point clouds into regular 3D voxel grids or image collections, and achieves efficient and effective performance in various applications.

picture

Figure 1 Application of PointNet

02

Innovation

This paper proposes a new deep network architecture suitable for processing three-dimensional unordered point sets - PointNet, which provides a unified and efficient way to infer and process three-dimensional geometric data such as point clouds or grids.

PointNet can handle various applications ranging from object classification, part segmentation to scene semantic parsing. Empirically, it shows comparable or even better performance than today's state-of-the-art methods. Theoretical analysis also reveals what the network learns and why it is robust to perturbations and corruptions of the input.

03

Algorithm introduction

PointNet is a deep neural network architecture for directly processing unordered 3D point cloud data.

Its network structure includes three key modules:

1. Maximum pooling layer: As a symmetric function, it is used to aggregate the information of all points.

2. Local and global information combination structure: used to combine local and global information.

3. Two joint alignment networks: used to align input points and point features.

In order to make the model invariant to the input arrangement, PointNet adopts a symmetric function to aggregate the information of each point. A symmetric function takes n vectors as input and outputs a new vector that is invariant to the order of the inputs. For example, plus and multiplication are symmetric binary functions. The complete network structure of PointNet is shown in Figure 2, and the classification network and segmentation network share most of the structure. Finally, the ICP algorithm is used to optimize the matching point pairs to obtain the final point cloud registration result.

picture

Figure 2 PointNet network structure

In PointNet, point cloud data is first processed through a multi-layer perceptron (MLP) to extract local features for each point. The coordinates and other attributes (such as color, normal vector, etc.) of each point are input into an MLP, which is mapped to a new feature vector through multiple fully connected layers to extract high-level features.

The output of the MLP is then fed into a max pooling layer for aggregating the information of the entire point cloud data. The max pooling layer reduces the feature vector of each point into a global feature vector which is invariant to the whole point cloud data. Finally, the global feature vector is fed into a classifier or segmenter to perform specific tasks, such as object classification, shape part segmentation, or scene semantic parsing, etc.

The symmetric function is used to aggregate the information of all points and extract the global features of the point cloud data. Specifically, for each point, its local features are fed into an MLP to extract the local features of the point. Then, the symmetric function aggregates the local feature vectors of each point into a global feature vector.

A symmetric function is a symmetric binary function that takes n vectors as input and outputs a new vector that is invariant to the order of the inputs. In PointNet, the symmetric function adopts the form of max pooling, that is, for each feature dimension, the maximum value of this dimension among all points is selected as the value of this dimension of the global feature vector. In this way, the symmetric function can aggregate the information of all points and extract the global features of the point cloud data.

A joint alignment network is used to align input points and point features. Specifically, PointNet includes two joint alignment networks, one for aligning input points and the other for aligning point features. For the alignment of points, PointNet uses a small network (T-net) to predict an affine transformation matrix and applies this transformation matrix directly to the coordinates of the input points.

T-net itself is similar to PointNet's large-scale network, consisting of basic modules, including point independent feature extraction, maximum pooling and fully connected layers. For the alignment of point features, PointNet inserts another alignment network and predicts a feature transformation matrix to align features from different input point clouds.

Since the dimensionality of the feature transformation matrix is ​​much higher than that of the space transformation matrix, PointNet adds a regularization term to constrain the feature transformation matrix to be close to the orthogonal matrix to improve the stability of optimization and the performance of the model.

04

experiment

The experimental process mainly includes the following parts:

Data Preparation: Collect and preprocess 3D point cloud datasets. These point clouds can be generated from different 3D scanners or simulations. Preprocess the data, such as normalizing coordinates, unifying the number and characteristics of point clouds, etc., to ensure data consistency and availability.

Network architecture construction: Build the PointNet network structure. PointNet mainly consists of multiple fully connected layers (MLP) and maximum pooling layers. Depending on the task, adjustments to the output layer may be required, such as adding classifiers or segmenters.

Training: PointNet is trained using the prepared dataset. During the training process, the Adam optimizer is used, and the cross-entropy loss function is used to optimize the network to perform better on specific tasks.

Evaluation: After training, PointNet is evaluated on the test set. Calculate the accuracy of the classification task or the IoU (Intersection over Union) of the segmentation task to measure the performance of the network.

Hyperparameter tuning: For some hyperparameters, such as learning rate, batch size, etc., tuning may be required to optimize the performance of the network.

Comparative experiments: Conduct comparative experiments with other existing 3D point cloud processing methods to prove the superior performance of PointNet.

Visual analysis: Understand the output of the network through visual analysis, such as observing the point cloud samples correctly classified by the network in the classification task, or the semantic segmentation results generated by the segmentation task. 7. Point cloud reconstruction experiment: Conduct experiments on the ShapeNet dataset to compare the reconstruction performance of PointNet and several other network architectures.

picture

Figure 3 PointNet robustness test

Figure 3 shows the histograms of the 4DMatch and 4DLoMatch benchmarks, where the overlap ratio threshold is set to 45%.

For more exciting content, please pay attention to the official account: BFT Robot
(Reply to "Promotion" in the background of the official account to view the year-end benefits of BFT Robot).

This article is an original article, and the copyright belongs to BFT Robot. If you need to reprint it, please contact us. If you have any questions about the content of this article, please contact us and we will respond promptly.

Guess you like

Origin blog.csdn.net/Hinyeung2021/article/details/131846319