Noisy Student of Semi-supervised Learning

basic structure

Semi-supervised learning pseudo-label learning
https://blog.csdn.net/weixin_42764932/article/details/112910467 is

similar to pseudo-label learning

  1. The pictures are labeled and unlabeled.

  2. Using labeled data and standard cross-entropy loss, an EfficientNet is trained as a teacher network.

  3. Use a teacher network that does not add noise to generate pseudo labels on unlabeled data. The pseudo labels can be soft labels (continuous distribution) or hard labels (one-hot distribution). The article says that soft labels work better.

  4. Under labeled and unlabeled data, use cross entropy to train a student network with noise.

  5. By treating the student network as a teacher network, iterate the above steps to generate new pseudo labels and train the student network.

The effect of adding noise to student models:

  1. Data noise: It is well known to improve generalization ability. For example, the invariant of the same class of different images encourages the student model to surpass the teacher to learn, and use more different images to make the same prediction.

  2. Model noise: Improve model robustness and generalization ability

Specific settings:

  1. Random depth : The survival probability factor is 0.8

  2. dropout: The classification layer introduces a drop rate of 0.5

  3. Random enhancement: Two random operations are applied and the magnitude is set to 27

  4. Data filtering: Filter the images with low confidence in the teacher model, because they usually represent out-of-domain images

  5. Data balance: balance the number of images in different categories

  6. The labels output by the teacher model use 1) soft labels (eg: [0.1,0.2,0.6,0.9]) or 2) hard labels (eg: [1,0,1,0,0,1]), which is shown by experiments , The soft label has a stronger guiding effect on the images outside the domain, so the author uses the soft label as the pseudo label format

Insert picture description here

Noise, random depth, and data expansion play an important role in making the student model better than the teacher model. In this regard, some people asked whether to add a regular term to the unlabeled data to prevent overfitting instead of noise. The author explained in the experiment that this is wrong of. Because in the case of denoising, the training loss of the unlabeled image did not decrease much, which shows that the model did not overfit the unlabeled data.

Refer to https://blog.csdn.net/qq_39426225/article/details/105571340?utm_medium=distribute.pc_relevant.none-task-blog-searchFromBaidu-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-searchFromBaidu -2.control

Guess you like

Origin blog.csdn.net/weixin_42764932/article/details/112980737