2021 ICCVW 论文:Reducing Label Effort: Self-Supervised meets Active Learning

Self-Supervised + Active Learning

1. Summary

  • Active learning reduces manual annotation by selecting representative or informative samples. Self-supervised training learns features from a large number of unlabeled samples and then tunes parameters on a small number of labeled samples. The current work focuses on whether these two methods can benefit from each other. This paper conducted experiments on the target recognition data CIFAR10/100 and Tiny ImageNet. The results show that self-supervised training is more effective than active learning in reducing the time-consuming labeling. But when the annotation budget is high, the combination of both is beneficial. The performance gap between active learning and self-training or training from scratch shrinks when nearly half of the dataset is labeled.

2. Introduction

  • Active learning can be divided into two subfields. The aim of informative-based methods is to identify those data samples for which the algorithm is most uncertain. Adding these samples to the pool of labeled data can improve algorithm performance. Representative based feature methods aim to label data in such a way that for all unlabeled data there is a "representative" (defined based on distance in feature space) labeled sample. Active learning methods are often evaluated by supervised training of the network only on a pool of labeled data: the best results are obtained.
  • The major advances in self-supervised training have come from recent work that learns representations that are invariant to a range of distortions of the input data (e.g. cropping, applying blur, flipping, etc.). In these methods, two warped versions of the image, called views, are produced. Then, the network is trained by enforcing that the representations of the two views are similar. To prevent these networks from converging to an invalid solution, different methods have been developed.
  • Self-supervised learning can learn high-quality features that are nearly identical to those learned by supervised methods. Therefore, it greatly increases the usefulness of unlabeled data. The standard active learning paradigm trains an algorithm on a labeled dataset and, based on the resulting algorithm, selects the data points expected to be the most informative for the algorithm to better understand the problem.
  • Based on our experiments, the following conclusions can be drawn:
      1. In our evaluation on three datasets, self-supervised training is much more effective than AL in reducing labeling effort.
      2. Self-supervised training + AL is significantly better than AL methods. However, for large annotation budgets (about 50% of the dataset in our experiments), the performance gap decreases.
      3. Based on results on three datasets, self-supervised training + AL slightly outperforms self-supervised training, but only when the labeling budget is higher.

3. Preface

  • We design a framework consisting of two parts: self-supervised training and active learning. First, we pre-train the self-supervised model on unlabeled samples. Next, fine-tune a linear classifier on top of the pretrained model using the initial labeled data. We then run an active learning loop with the fine-tuned model to select the most informative or representative samples for labeling. Therefore, the original dataset becomes part of the labels.
  • The self-supervised model adopts SimSiam, which is based on Siamese networks and tries to maximize the similarity between two augmented images, subject to certain conditions to avoid collapsing solutions. This allows us to obtain meaningful representations without using negative pairs. Rich representations may also facilitate representation-based active learning methods.
    Network framework diagram
  • The framework consists of 3 stages:
       1) Self-supervised training is trained on the entire dataset.
       2) Freeze the weights of the backbone network and given a small amount of labeled data, fine-tune in a supervised manner using a linear classifier or support vector machine.
       3) Run the model for inference on unlabeled data, then sort the samples from least informative to most informative through the acquisition function. Finally, the samples with the most information are annotated by experts and added to the annotation set.

3.1 Active Learning

  • Active learning is usually set to multiple rounds of iterative training and selection of samples. The number selected each time is called budget
  • At the beginning of each cycle, the model is trained on the labeled samples; after training, the model selects samples for labeling through the collection function at the end of each training; this labeled sample is added to the labeled data set for the next cycle of training. until the number of cycles is reached. Acquisition function is a crucial part in AL.

3.2 Self-supervised training

SimSiam architecture

  • One of the branches of SimSiam gets an additional predictor (MLP network) whose output is aimed to be as close as possible to the other branch. One branch does classification prediction, and the other branch does not perform backpropagation during training. The model increases the similarity between two branches.
  • SimSiam, in addition to being simple, requires neither negative sample mining nor large mini-batches, which significantly reduces GPU requirements.

4. Experimental setup

  • Randomly select 1%, 2%, and 10% of the entire dataset uniformly from all classes. For one of the datasets, we also evaluate the selection size of 0.1% and 0.2%.
  • In each cycle, training is either restarted entirely, or the backbone network is first pretrained using self-supervised training. The model is trained in c cycles until all selected sizes are completed.

5. Experimental results

  • AL performance on cifar10
    insert image description here

  • Experiments show that self-supervision greatly reduces the required labels, especially at low budgets (few training labels).

  • Both methods (with and without self-supervised pre-training) achieve almost full performance after labeling 50% of the data, narrowing the gap between self-supervised and supervised methods. From an active learning perspective, random sampling outperforms AL when labeled data is less than 1%.

  • AL performance on cifar100
    insert image description here

  • When approaching 50% labeled data, AL without self-supervised training performs on par with self-supervised training counterparts, implying that the impact of self-supervised training decreases when the budget increases. But with or without self-supervised pre-training, random sampling outperforms low-budget active learning methods in the low-budget setting.

  • AL performance on Tiny ImageNet
    insert image description here

  • Self-supervised pre-training drastically reduces the required labeling in low-budget scenarios. Unlike the CIFAR dataset, AL needs more than 50% of the labels to close the performance gap between them and self-supervised training. In methods that employ self-supervised training, random sampling works better. But adding labeled data as above can narrow the performance gap with AL methods.

  • Both experimental results show that SimSiam helps a lot with low budget in the active learning framework. At high budgets, the performance gap between training from scratch and SimSiam narrows.

  • Correlation between number of samples per class required for AL and number of classes in the datasets Under this budget, AL+self-training performs better than Random+self-training.
    insert image description here

6 Conclusion

  • The performance gap between active learning and self-supervised training gradually narrows when approaching almost half of the dataset is labeled.

Guess you like

Origin blog.csdn.net/u013308709/article/details/129109629