Weakly Supervised Semantic Segmentation of Large-Scale Point Clouds

Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud

Summary

  1. A proxy task, point cloud colorization, is constructed to transfer learned priors from large unlabeled point clouds into a weakly supervised network using self-supervised learning. In this way, under the guidance of heterogeneous tasks, the representation ability of weakly supervised networks can be improved.
  2. To generate pseudo-labels for unlabeled data, a sparse label propagation mechanism based on generated class prototypes is proposed to measure the classification confidence of unlabeled points
    insert image description here

Method in this paper

insert image description here
The method aims to use knowledge transfer and label propagation to solve the instability and poor representation problems generated by networks under weakly supervised large-scale point cloud segmentation, and is divided into the following three parts:

  1. Self-supervised agent task learning prior knowledge
  2. Fine-tuning Weakly Supervised Semantic Segmentation Networks Using Prior Knowledge
  3. Sparse label propagation produces pseudo-labels for unlabeled data, improving the effectiveness of weakly supervised tasks.
    Point cloud colorization is used as a self-supervised proxy task to learn prior-based initialization distributions. A local-aware regularization method is proposed to learn contextual information. Then, the weakly supervised network is initialized with the pre-trained parameters of the encoder to improve the effectiveness of feature representation.
    Furthermore, we utilize labeled points to directly supervise the network and fine-tune network parameters. A nonparametric label propagation method for weakly supervised semantic segmentation is also introduced. Some unlabeled points are assigned pseudo-labels and embeddings of unlabeled points via the similarity between class prototypes. Therefore, more supervision information is introduced to improve the effectiveness of training. Considering the computational and storage efficiency of large-scale point clouds, we choose RandLA-Net as the backbone, which is an efficient and lightweight neural architecture for semantic segmentation of large-scale point clouds. In the following, we describe the self-supervised pretext task and the sparse label propagation method

Self-supervised agent tasks

Unlike training from scratch for 2D vision tasks, colorization provides a strong supervisory signal. Training data is easy to collect, so any colored point cloud can be used as training data. Due to the advancement of point cloud acquisition equipment, we have access to a large amount of unlabeled point cloud data with color information. We study and implement self-supervised learning of point cloud colorization as a proxy task. The purpose of point coloring is to guide the self-supervised model to learn feature representations. Lab color space is good for perception of distance

Therefore, we perform point cloud colorization via a, b completion in this color space. Given a luminance channel L, the network predicts a and b color channels and a local Gaussian distribution for each point. It is worth noting that the values ​​in channel L are replicated by a factor of 3 at each point to keep the same dimensionality as the input to the segmentation task. Therefore, the input point cloud Xs = [x1, x2, ...], xN s] ∈ RN s × 6 consists of N 3D points with coordinates xyz, where 3 l are the number of points in a point cloud.

Furthermore, we implement RandLA-Net on self-supervised tasks by modifying the final output layer. That is, the output of this network is a 6-dimensional vector containing the predicted a, b and the corresponding local mean and variance.

insert image description here
where a, b, µ and σ denote the predicted a, b and the corresponding local mean and variance, respectively.
Furthermore, to learn the local color distribution of each point, we introduce a local-aware regularization term. If the network is able to predict the color distribution (mean and variance) of neighbors, local features for weakly supervised semantic segmentation with local information consistent with the segmentation task can be embedded. Given a point xi as the centroid, the local neighborhood N(xi) is computed by KNN according to the Euclidean distance. The standard values ​​µai and σai for the channel can be given by:
insert image description here
insert image description here

Why does knowledge learned from proxy tasks benefit semantic segmentation?
Proxy tasks learn similar feature distributions to semantic segmentation tasks. Objects of the same category usually have similar color distributions, e.g. vegetation is usually green while roads are black. The surface color texture of the scene provides ample cues for many categories.
The proxy task embeds local feature representations. We introduce a locality-aware regularization term to constrain the local color distribution to be consistent with the original distribution. Therefore, it allows the network to embed more local information. Therefore, it can enhance local feature embeddings in semantic segmentation tasks.

Sparse Label Propagation

insert image description here

In the case of fewer labeled points, the segmentation performance drops significantly. The main reason is that the supervision information provided by a few labeled points cannot be well propagated to unlabeled points. Therefore, we use labeled points to assign pseudo-labels to unlabeled points, and further provide additional supervision information to improve the representation of weakly supervised networks.
To achieve this goal, the following aspects need to be considered: 1) The computational complexity is not high and the memory resources are not large. Large-scale point clouds usually contain ~106 points, and if all points are used as nodes to build a fully connected graph, it will consume a lot of memory and computing resources. 2) Anchor points should be sparse. Some ambiguous points should not be given labels to train the network. 3) Spread labels should be soft. The propagated labels should be related to their similarity, the higher the similarity, the more similar the labels.
A sparse label propagation method is designed. The overall framework is shown in the figure. It consists of three parts: class prototype generation, class assignment matrix construction, and sparse pseudo-label generation.

Class prototype generation
In the last two layers of the network, we output embedding Z = [zl1, zl2,..., zlM;Zu1, zu2,..., zuN]∈R(M+N)×d, and the corresponding prediction Y = [yl1, yl2,...,ylM;Yu1,yu2,...,yuN]∈R(M+N)×C from M labeled points and N unlabeled points. We denote the embeddings of labeled and unlabeled points by Zl, Zu, respectively. A C prototype is first generated from the labeled points to represent the class C; specifically, we simply take the average of the labeled point embedding Zl for each class. For class c, the prototype ρc is:
insert image description here
Class assignment matrix construction
Similarity matrix W∈RN×C :
insert image description here
Sparse pseudo label generation
Each class has some points with low similarity. These points are not suitable for providing supervised information for training the network. Specifically, for each class, according to the class assignment matrix S, the first k unmarked points are selected to obtain a mask Mk∈{0,1}N×C, where mkic = 1 means that the embedding of the i-th point is consistent with the class c similar to the first k points, N is the number of unmarked points. This is a label extension method with a balanced number of classes
which can alleviate the class imbalance to some extent.
Since an unlabeled point may belong to multiple classes, we choose the most similar class to generate a binary mask. From Mk, we get a point mask Mpt ∈ {0,1}N. MPTi = 1 means that the I-th point is assigned a pseudo-label. We can obtain sparse pseudo-labels Y p ∈ RN×C by the following method:
insert image description here
Compared with the traditional fully connected graph label propagation method, this method has high computational efficiency. The complexity of this method is O(N Cd), while the complexity of the fully connected graph method is O((N + M)2d). C is the number of categories, and d is the dimension of the ancestor. C has a magnitude of ~101, a much smaller
loss function than N
insert image description here
insert image description here
A non-linear parameter λ is introduced to balance these two losses:
insert image description here

Guess you like

Origin blog.csdn.net/qq_45745941/article/details/129967617