PointNet++ understanding

1. What problems does PointNet++ solve in PointNet?

  1. PointNet's point-wise MLP only represents each point, and its ability to integrate local structural information is too weak --> Improvements in PointNet++: sampling and grouping integrate local neighborhoods
  2. PointNet's global feature is directly obtained by max pooling, which will cause huge information loss whether it is for classification or segmentation tasks --> Improvement of PointNet++: hierarchical feature learning framework, through multiple set abstraction step-by-step downsampling, to obtain different scales Different levels of local and global features (the last set abstraction output can be considered as global feature), the global features obtained in this way are more advanced and have stronger representation capabilities .
  3. The global feature of the segmentation task is directly copied and spliced ​​with the local feature. If it is spliced ​​directly, there will be too few global features available, and the ability to generate discriminative features is limited --> Improvement of PointNet++: the encoder-decoder structure is designed for the segmentation task . , first downsample and then upsample, and use skip connection to splice the local-global features of the corresponding layer. Such features are more advanced and discriminative .
  4. Of course, PointNet++ also solves the problem that when the point cloud is uneven, the features learned in dense areas may not be suitable for sparse areas. The solution is to extract features from sub-regions with different radii and then stack features. Sub-regions with different radii The dimension transformation of feature extraction will also vary with the radius.

2. There is a non-differentiable FPS sampling operation in the SA module of PointNet++. How can the gradient be backpropagated?

There is such a type of operation in 3D detection. In vernacular, it can be classified as "routing": that is, where features are taken out or where features are gathered. The establishment process of "routing" itself can be non-differentiable (such as FPS and ball query). However, once the "routing" is established, the feature transfer and mapping process is generally derivable. Therefore, I think the specific implementation of FPS operation in backpropagation is similar to the gradient backpropagation of torch.scatter. Initialize a 0 tensor and then assign it according to scatter.

For an example in pointnet++ (because you cannot debug custom cuda operators written in C++, debugging cannot enter the c++ code, you can only go to the calling step, so it is recommended to use global search directly), for forward For those operations that cannot be guided during propagation, the backward function must be rewritten. When rewriting, the cuda code implemented by oneself may be called. You can see that only the first return value of the rewritten backward function has a gradient, and the second one has no gradient. The backward parameter input comes from the gradient of the forward output, and the backward return value corresponds to the gradient of the forward input, so it means that idx has no gradient and is truncated, which means that the idx path is dead, but the features path still has gradients. :

Case :
Insert image description here
Insert image description here
External calling logic:
Insert image description here

Refer

  1. [Code Reading] PointNet++ code review
  2. [Code Reading] Detailed explanation of defining your own CUDA programming functions in Pytorch
  3. [Code Reading] Detailed explanation of the specific implementation of PointNet++
  4. Python uses setup to create custom packages-packaging-installation
  5. pytorch builds CUDA/C++ extension through torch.utils.cpp_extension
  6. Use torch.autograd.Function to customize the forward and backward of the layer
  7. PyTorch source code interpretation of torch.autograd: Detailed explanation of gradient calculation

Guess you like

Origin blog.csdn.net/Rolandxxx/article/details/128408625