[Semi-supervised learning] MixMatch, UDA, ReMixMatch, FixMatch

Semi-Supervised Learning (SSL) SOTA has been refreshed by Google again and again, starting from MixMatch, to UDA and ReMixMatch in the same period, and then to FixMatch in 2020.

These four articles on deep semi-supervised learning start from the two aspects of consistency regularization and entropy minimization:

  • Consistency regularization: consistency, injecting noise into the input picture or middle layer, the output of the model should be kept as constant or approximate as possible.
  • entropy minimization: Minimize the entropy. The entropy of the model on unlabeled data should be minimized as much as possible. Pseudo label also implicitly uses entropy minimization.

Consistency Regularization

For each unlabeled instance, consistency regularization requires that the output of two randomly injected noises be approximated.

For consistency regularization, how to inject noise and how to calculate approximation is the difference between each method. Injecting noise can be through the model itself (such as dropout) or adding noise (such as Gaussian noise), or through data augmentation; the method of calculating consistency can use L2, KL divergency, cross entropy.

Entropy Minimization

MixMatch, UDA and ReMixMatch use entropy minimization indirectly through temperature sharpening, while FixMatch uses entropy minimization indirectly through Pseudo label. Or it can be considered that entropy minimization is used as long as it is obtained by obtaining artificial labels of unlabeled data and then trained according to supervised learning methods (such as cross entropy loss). Entropy minimization can be achieved by calculating the loss and consistency regularization of the unlabeled data part.

Both temperature sharpening and pseudo label get artificial labels of unlabeled data. When the temperature = 0 of the former, the two are equal. Pseudo label is simpler than temperature sharpening because there is one less temperature hyperparameter.

If you don't use entropy minimization, temperature sharpening and pseudo label are actually unnecessary, and you only need to randomly inject the unlabeled instance of noise twice to approximate the output to ensure consistency regularization.

In other words, getting artificial labels for unlabeled data can make entropy minimization and consistency regularization through a loss.

结合 Consistency Regularization 和 Entropy Minimization

Generally speaking, the unlabeled data in semi-supervised learning will use all training data sets, that is, labeled samples will also be used as unlabeled samples.

In semi-supervised learning, the labels of labeled data are all given, while the labels of unlabeled data are not known. So how to get the artificial label of unlabeled data, the practices of MixMatch, UDA, ReMixMatch and FixMatch are more or less different:

  • MixMatch: average K prediction predictions of weak augmentation (such as shifting and flipping), and then undergo temperature sharpening;
  • UDA: a prediction of weak augmentation, and then temperature sharpening;
  • ReMixMatch: a prediction of weak augmentation, then distribution alignment, and finally temperature sharpening;
  • FixMatch: a prediction of weak augmentation, and then one-hot gets the hard label.

With the artificial tags, we can train in a supervised learning manner. This way of thinking uses entropy minimization. From the perspective of consistency regularization of unlabeled data, we need to inject different noises to make the predictions of unlabeled data consistent with their artificial labels.

MixMatch, UDA, ReMixMatch, and FixMatch all use data augmentation to change the input samples to inject noise. The difference is the specific method and strength of data augmentation:

  • MixMatch: a weak augmentation gets prediction, which is the same as normal supervised training, except that the unlabeled loss uses L2;
  • UDA: a strong augmentation (RandAugment) gets prediction;
  • ReMixMatch: Multiple strong augmentation (CTAugment) gets predictions, and then participates in the calculation of unlabeled loss, that is, an unlabeled instance and a step are calculated after multiple enhancements;
  • FixMatch: A strong augmentation (RandAugment or CTAugment) gets prediction.

Starting from UDA and ReMixMatch, strong augmentation introduced semi-supervised training. UDA uses the strong augmentation method of RandAugment proposed by the author, and ReMixMatch proposes a CTAugment. FixMatch used the strong augmentation used in UDA and ReMixMatch.

For the loss of the unlabeled data part:

  • MixMatch:L2 loss;
  • UDA:KL divergency;
  • ReMixMatch: cross entropy (including self-supervised rotation loss and pre-mixup unlabeled loss without mixup);
  • FixMatch: cross entropy with threshold.

FixMatch: Simplifying SSL with Consistency and Confidence

FixMatch simplifies MixMatch, UDA and ReMixMatch, and then obtains better results:

  • First, temperature sharpening is replaced by pseudo label, which is a simplification;
  • Second, FixMatch sets a threshold to calculate the unlabeled loss for the unlabeled instance where the confidence of prediction exceeds the threshold when calculating unlabeled loss, so that the weight of unlabeled loss can be fixed. This is the second simplification.

References

[1] Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C. (2019). MixMatch: A Holistic Approach to Semi-Supervised Learning arXiv https://arxiv.org/abs/1905.02249
[2] Berthelot, D., Carlini, N., Cubuk, E., Kurakin, A., Sohn, K., Zhang, H., Raffel, C. (2019). ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring arXiv https://arxiv.org/abs/1911.09785
[3] Xie, Q., Dai, Z., Hovy, E., Luong, M., Le, Q. (2019). Unsupervised Data Augmentation for Consistency Training arXiv https://arxiv.org/abs/1904.12848
[4] Sohn, K., Berthelot, D., Li, C., Zhang, Z., Carlini, N., Cubuk, E., Kurakin, A., Zhang, H., Raffel, C. (2020). FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence arXiv https://arxiv.org/abs/2001.07685

Guess you like

Origin www.cnblogs.com/wuliytTaotao/p/12727922.html