【Paper Reading】MixMatch: A Holistic Approach to Semi-Supervised Learning

Recommended reading:
Pseudo-Label: A Simple and Effective Semi-Supervised Approach to
Pseudo-Labeling in Deep Learning, just pick the classes with the largest predicted probability and use them as if they were real labels.
Entropy regularization and entropy minimization: Entropy Minimization & Regularization

Official GitHub source code: google-research/mixmatch

Summary

MixMatch unifies the current mainstream methods of semi-supervised learning, resulting in a new algorithm that guesses low-entropy labels for data-augmented unlabeled examples and mixes labeled and unlabeled data using MixUp.


1. Introduction

Many semi-supervised learning methods calculate a loss term on unlabeled data, and the loss term belongs to one of the following three categories: (1
) entropy minimization: it encourages the model to output confident predictions on unlabeled data;
( 2) Consistency regularization: it encourages the model to produce the same output distribution when its input is perturbed; (
3) Universal regularization: it encourages the model to generalize well and avoid overfitting to the training data.

MixMatch, an SSL algorithm that introduces a single loss, elegantly unifies these mainstream methods for semi-supervised learning.
insert image description here
Figure 1: Schematic of the label guessing process used in MixMatch. Random data augmentation is applied K times to an unlabeled image, and each augmented image is fed to a classifier. The mean of these K predictions is then "sharpened" by adjusting the temperature of the distribution.

2. Related work

Super semi-supervised learning MixMatch

2.1 Consistency Regularization Self-consistent regularization

A common regularization technique in supervised learning is data augmentation, which applies input transformations that are assumed not to affect class semantics. For example, in image classification, it is common to elastically deform or add noise to the input image, which can greatly change the pixel content of the image without changing its label

Self-consistent regularization performs data enhancement on unlabeled data, and the generated new data is input into the classifier, and the prediction results should remain self-consistent. That is, for samples generated by the same data enhancement, the model prediction results should be consistent . This rule is added to the loss function
insert image description here
. Note that Augment(x) is a random transformation, so the two terms in equation (1) are not exactly the same.
MixMatch exploits a self-consistent form of regularization by applying standard data augmentation (random horizontal flipping and cropping) to images.

2.2 Entropy Minimization

A common underlying assumption of many semi-supervised learning methods is that the decision boundary of a classifier should not pass through high-density regions of the marginal data distribution. One way to enforce this is to require the classifier to output low-entropy predictions on unlabeled data. There is a loss term which minimizes the entropy of the model predicting unlabeled data.

"Pseudo-Label" achieves entropy minimization implicitly by constructing hard (1-hot) labels from high-confidence predictions on unlabeled data and using them as the training objective of standard cross-entropy loss. MixMatch also implicitly minimizes entropy by using a "sharpening" function on the target distribution of unlabeled data.

2.3 Traditional Regularization

insert image description here
We use weight decay, which penalizes the L2 criterion for the model parameters. We also use MixUp in MixMatch. We leverage MixUp as a regularizer (applied to labeled data points) and a semi-supervised learning method (applied to unlabeled data points). MixUp has been applied to semi-supervised learning before;


3. MixMatch

insert image description here

MixMatch - Super powerful semi-supervised learning method

X is labeled data, U is labeled without data, X_hat is augmented data and is labeled, U_hat is augmented data and unlabeled.
insert image description hereinsert image description here

3.1 Data augmentation

Use data augmentation on both labeled and unlabeled data. For each xb in batch X of labeled data, we generate a transformed version xˆb = Augment(xb). For each ub in a batch of unlabeled data U, we generate K augmented versions uˆb, k = Augment(ub), k ∈ (1, …, K).

3.2 Label Guessing

For each unlabeled example in U, MixMatch uses the model's predictions to generate a "guess" for that example's label. This guess is later used in the unsupervised loss term. To do this, we compute the mean of the class distribution predicted by the model over all K increments of ub in the following way.

insert image description here

Sharpening: The variance of the probability distribution is smaller, the prediction results are more consistent, and the system entropy is smaller. In other words: those with higher probability will be pulled higher, and those with lower probability will be pulled lower.
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

Pseudocode: read in conjunction with Figure 1
insert image description here

3.3 3.3 MixUp


4. Experiments

Comparison of different methods:
insert image description here


insert image description here

Contribution of each part of the anatomy (Ablation Test): insert image description here
Comparison of semi-supervised algorithm results:
insert image description here
insert image description here


5. Conclusion

MixMatch: Summary of innovation points
(1) MixMatch integrates self-consistent regularization, and uses random left-right flipping and cutting of images during data enhancement
(2) MixMatch uses the Sharpening function to minimize the classification entropy of unlabeled data.
(3) MixMatch uses Adam as the optimizer, and uses L2 regularization for weight decay.
(4) MixMatch uses Mixup as the idea of ​​data enhancement.

Guess you like

Origin blog.csdn.net/weixin_45751396/article/details/127630249