In-depth understanding of deep learning - regularization (Regularization): semi-supervised learning

Category: General Catalog of "In-depth Understanding of Deep Learning"


In the framework of semi-supervised learning, P ( x ) P(x)Unlabeled samples produced by P ( x ) and P ( x , y ) P(x, y)P(x,y ) labeled samples are used to estimateP ( y ∣ x ) P(y | x)P ( y x ) or according toxxx predictsyyy . In the context of deep learning, semi-supervised learning usually refers to learning a representationh = f ( x ) h = f(x)h=f ( x ) . The purpose of learning representations is to make samples in the same class have similar representations. Unsupervised learning can provide useful clues on how to cluster samples in a representation space. Samples that are tightly clustered in the input space should be mapped to similar representations. In many cases, linear classifiers on new spaces can achieve good generalization. A classic variant of this approach is to use principal component analysis as a preprocessing step before classification (on the projected data).

We can build a model where the generative model P ( x ) P(x)P(x) P ( x , y ) P(x, y) P(x,y ) and discriminant modelP ( y ∣ x ) P(y | x)P ( y x ) shares parameters without separating unsupervised and supervised parts. We weigh the supervised model criterion− log ⁡ P ( y ∣ x ) − \log P(y | x)logP ( y x ) and unsupervised or generative model criteria such as− log ⁡ P ( x ) −\log P(x)logP(x) − log ⁡ P ( x , y ) −\log P(x, y) logP(x,y ) ). The generative model criterion expresses a special form of prior knowledge of the solution to the supervised learning problem, namelyP ( x ) P(x)The structure of P ( x ) is connected to P ( y ∣ x ) P(y | x)by some shared parameterP ( y x ) . By controlling the generative criterion in the total criterion, we can obtain a better trade-off than pure generative or pure discriminative training criterion. Salakhutdinov and Hinton describe a method for learning kernel functions in regressive kernel machines, where modelingP ( x ) P(x)Using unlabeled samples when P ( x ) greatly improves P ( y ∣ x ) P(y | x)P ( y x ) effect.

References:
[1] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015
[2] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola. Dive Into Deep Learning[J]. arXiv preprint arXiv:2106.11342,

Guess you like

Origin blog.csdn.net/hy592070616/article/details/130754655