【论文阅读】【Arxiv 2022.02】Unsupervised Representation Learning for Point Clouds: A Survey

insert image description here

Abstract

1 INTRODUCTION

insert image description here

insert image description here

2 BACKGROUND

2.1 Basic concepts

insert image description here
insert image description here

3 POINT CLOUD DATASETS

4 COMMON DEEP ARCHITECTURES FOR POINT CLOUD LEARNING

5 UNSUPERVISED POINT CLOUD REPRESENTATION LEARNING (unsupervised point cloud representation learning)

As shown in Figure 2, we classify existing methods for unsupervised point cloud representation learning into four categories, including generation-based methods, context-based methods, multimodal-based methods, and local descriptor-based methods (generation -based methods, context-based methods, multi-modal-based methods, and local descriptor based methods.). Based on this taxonomy, we have classified existing methods and will provide a detailed review of them as follows.
Figure 2: Classification of existing unsupervised point cloud representation learning methods
Figure 2: Classification of existing unsupervised point cloud representation learning methods

5.1 Generation-based methods

The generation-based unsupervised point cloud representation learning method involves the process of generating point cloud objects, which can be further classified into four sub-categories according to pre-text tasks,Including point cloud self-reconstruction (generates the same point cloud object as the input), point cloud GAN (generates fake point cloud objects), point cloud upsampling (generates a point cloud of similar shape but denser than the input), and point cloud completion (predicting missing parts of some point cloud objects). The ground truth for training these methods is the point cloud itself, which requires no human annotation and can be viewed as an unsupervised learning method. The generation-based methods are listed in Table 2.

Table 2: Overview of Unsupervised Representation Learning Methods Based on Generated Point Clouds
insert image description here

5.1.1 Learning through point cloud self-reconstruction

insert image description here
One of the most common unsupervised methods for learning point cloud representations is self-reconstructing 3D objects, which encodes point cloud samples into representation vectors and decodes them back to the original input data. During this process, shape information and semantic structure are extracted and encoded into the representation. Since no human annotation is involved, it belongs to unsupervised learning. A typical and most commonly used model is Autoencoder [69]. As shown in Figure 11, it consists of an encoder network and a decoder network. The encoder compresses and encodes the point cloud object into a low-dimensional embedding vector named codeword [56] (codeword). It is then decoded back into 3D space and the output is required to be the same as the input. Autoencoders learn dimensionality-reduced representations by training the network to ignore unimportant data (“noise”) by trying to regenerate the input from the encoding to validate and refine the encoding [70].

insert image description here
A series of unsupervised methods based on self-reconstruction are proposed:
insert image description here
insert image description here
To further extract local geometric features:
insert image description here

5.1.2 Learning through point cloud GAN

insert image description here
insert image description here
insert image description here
insert image description here

5.1.3 Learning through point cloud up-sampling

insert image description here
As shown in Fig. 13, given a set of points, the point cloud upsampling task aims to generate a denser set of points, which requires a deep point cloud network to learn the basic geometry of 3D shapes. No human annotation is involved, it belongs to unsupervised learning.

Related work:
insert image description here
insert image description here

5.1.4 Learning through point cloud completion

insert image description here
Point cloud completion is the task of predicting any missing parts based on the remaining 3D point cloud objects. The network needs to learn the internal geometry and semantic knowledge of objects in order to correctly predict the missing parts, which are then transferred to downstream tasks. These methods belong to unsupervised learning because point cloud completion tasks do not require human annotation.

Related work:
insert image description here
insert image description here
Recently, recovering missing parts from incomplete inputs as a surrogate task (pre-text task) has been shown to be very successful in NLP [5], [6] and 2D vision [10], while unsupervised point cloud There is little research on learning. We believe this is a potential and promising direction for future research. (Because of the small amount of data?)

5.1.5 Discussion

Unsupervised learning of point clouds based on generative tasks is a major research direction with a long history. Existing methods mainly focus on learning from object-level point clouds, while few studies on scene-level data limit the application of unsupervised learning. In contrast, generative-based unsupervised learning methods have achieved great success in NLP [6], [32] and 2D vision [10]. To that end, we think there is a lot of potential in this area.

5.2 Context-based methods

Another class of unsupervised point cloud learning methods are context-based methods. Unlike generative-based methods that learn by generating point clouds, these methods use discriminative pre-text tasks to learn different contexts of point clouds, including context similarity, spatial
context structures and temporal context structures). Table 3 summarizes a series of methods.

insert image description here

5.2.1 Learning with context similarity

insert image description here
This approach forms unsupervised learning by exploring potential contextual similarities between samples. A typical method is contrastive learning. In recent years, contrastive learning has been widely used in 2D [7], [8], [104] and 3D [3], [46], [96] unsupervised representation learning. showed excellent performance. Figure 15 shows an example of instance-wise contrastive learning:
insert image description here
related work:
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

5.2.2 Learning with spatial context structure

insert image description here

5.2.3 Learning with temporal context structure

insert image description here
insert image description here

5.2.4 Discussion

insert image description here

5.3 Multiple modal-based methods

insert image description here
insert image description here
insert image description here

5.4 Local descriptor-based methods

insert image description here
insert image description here

6 BENCHMARK PERFORMANCES

insert image description here

6.1 Object-level tasks

6.1.1 Object classification

Backbone is mostly PointNet, PointNet++, DGCNN, RSCNN, so these networks must be mastered

insert image description here
insert image description here

6.1.2 Object part segmentation

insert image description here

6.2 Scene-level tasks

7 FUTURE DIRECTION

insert image description here

insert image description here
insert image description here
insert image description here
insert image description here

8 CONCLUSION

insert image description here

Guess you like

Origin blog.csdn.net/weixin_43154149/article/details/123711397