Interpretation of PointNet

Problems solved by PointNet:

As shown in FIG:

1. Classification of point cloud images (what is the whole point cloud object)

2. Parts segmentation of point cloud images (a structure in which objects represented by the entire point cloud can be split)

3. Semantic segmentation of point cloud images (distinguish different objects in the 3D point cloud environment with different colors)

The input and output effects shown in the paper:

1. The effect of part segmentation (on the left is an incomplete input point cloud, and on the right is a complete input point cloud)

2. The effect of semantic segmentation

 The three characteristics of the point cloud mentioned in the paper:

1. The disorder of points: point cloud is an unordered collection, and there is no strict order between points. For example, after exchanging two points, it still represents the same point cloud.

2. Interaction between points: Although the points of the point cloud are discrete, they can together form the outline of an object or environment. This means that the points are not isolated from each other, and adjacent points form a meaningful subset. Therefore, the model needs to capture local structures from points in their vicinity, as well as the interactions between local structures.

3. Transformation invariance: After the point cloud undergoes rigid transformation (rotation and translation), it is input again, and the output classification or segmentation results remain unchanged.

To solve the classification and segmentation tasks of point cloud images is how to extract features from point cloud data:

        If the MLP neural network is not used to increase the dimension and directly perform the pooling operation to extract features, then only n points can extract a 3-dimensional feature (2, 3, 4), and then perform K classification, which seems very unreasonable. There are too many loss features. Therefore, it is necessary to upgrade the dimension before performing the pooling operation, so that there are enough features.

        As shown in the above formula, the h function is equivalent to the dimension-up operation, the g function is the pooling extraction feature operation, and the y function is the classification operation. 

        After the above explanation, the PointNet architecture is easy to understand. PointNet consists of two parts: classification network and segmentation network. The classification network takes n points as input, applies input and feature transformations, and then aggregates point features by max pooling. The output is the classification score for the K classes. Segmentation networks are an extension of classification networks. It concatenates global and local features and outputs a score for each point. A multi-layer perceptron (MLP) consists of 5 hidden layers with neuron sizes of 64, 64, 64, 128, 1024, and all points share a copy of the MLP. The MLP close to the output in the classification and segmentation network consists of two layers of size 512,256.

 

 

Guess you like

Origin blog.csdn.net/xsh_roy/article/details/124667470