Recommended system notes (7): understanding of self-supervised learning and comparative learning

Video learning link:

Basic explanation of Contrastive Learning_哔哩哔哩_bilibili

background

        In a model based on deep learning, a large labeled data set (such as Imagenet) is crucial. For example, the visual trainformer neural network I have implemented before, its results are greatly affected by the data set, and in small data sets The effect is very poor, not even as good as the LeNet5 neural network.

        In reality, there are no large-scale labeled datasets in many cases, such as the field of medical imaging and the field of recommendation systems with sparse data. How to improve the feature extraction ability of the model? Self-supervised learning came into being.

        Self-supervised learning (Self-supervised learning) aims to improve the feature extraction ability of the model by designing auxiliary tasks (Proxy tasks) to mine the representational characteristics of the data itself as supervisory information for unlabeled data. Here is a distinction from several other learning methods:

        1. Supervised (Supervised): Supervised learning is to learn a function (model parameter) from a given labeled training data set. When new test data is input, the result can be predicted according to this function;

        2. Unsupervised (Unsupervisedg): Unsupervised learning is to analyze the regularity and other analytical features of the data itself from unlabeled data. Unsupervised learning algorithms are divided into two categories: methods based on probability density function estimation and methods based on similarity measures between samples;

        3. Semi-supervised learning (Semi-supervised): Semi-supervised is between supervised learning and unsupervised learning, that is, only a part of the data in the training set has labels, and model training needs to be completed by pseudo-label generation and other methods;

        4. Weakly-supervised: Weakly-supervised means that the training data has only inaccurate or incomplete label information. For example, in the target detection task, the training data only has classified category labels and does not contain Bounding box coordinate information.

        Among the above concepts, unsupervised and self-supervised learning have the greatest similarity. The training data of both are unlabeled, but the difference is that self-supervised learning obtains supervisory information by constructing auxiliary tasks . In this process, new information is learned. knowledge; while unsupervised learning does not mine label information for new tasks from data.

        Among them, auxiliary tasks refer to indirect tasks designed to achieve specific training tasks. The advantage of the pretext task is to simplify the solution of the original task. In deep learning, it avoids manual labeling of samples and realizes unsupervised semantic extraction. Pretext tasks can be further understood as: auxiliary tasks that are helpful to the target task. The main pretext tasks include: image rotation, image coloring, and image restoration, which are operations similar to data enhancement.

        

relation

The relationship between contrastive learning and self-supervised learning:

        There are two types of self-supervised learning algorithms: comparative methods and generative methods. Contrastive learning is self-supervised learning, so there is no label for comparative learning. Contrastive learning is to learn features by constructing positive and negative samples. How to construct positive and negative examples is very important for contrastive learning. For an input sample x, there are samples x+ that are similar to it and samples x- that are not similar to it. What the contrastive learning needs to do is to learn an encoder f that can bring x closer to The distance between positive samples, pushing away the distance between x and its negative samples. Right now:

Generative Self-Supervised Learning vs. Contrastive Self-Supervised Learning:

        Generative: the training encoder encodes the input x into an explicit vector z, and the decoder reconstructs x from z, minimizing the reconstruction error;

        Contrastive: The training encoder encodes the input x into an explicit vector z to measure similarity (e.g. maximize mutual information).

Thought

        Contrastive learning is a self-supervised learning method used to learn general characteristics of a dataset by letting the model learn which data points are similar or different without labels.

insert image description here

        Essentially, contrastive learning allows our machine learning models to do the same. It observes which pairs of data points are "similar" and "dissimilar" in order to understand higher order characteristics of the data before performing tasks such as classification or segmentation. Such as learning the classification of cats and dogs.

principle

The process of self-supervised comparison in the image domain is described as three basic steps:

        (1) For each image in the dataset, we can perform two combinations of augmentations (i.e. crop+resize+recolor, resize+recolor, crop+recolor, etc.). We want the model to know that these two images are "similar" because they are essentially different versions of the same image.

        (2) Feed these two images into our deep learning model (Big-CNN, such as ResNet) to create a vector representation for each image. The goal is to train the model to output similar representations of similar images.

        (3) Finally, the similarity of the two vector representations is maximized by minimizing the contrastive loss function.

        This contrastive learning approach is further dissected by breaking it down into three main steps: data augmentation, encoding, and loss minimization.

        It is worth noting that data augmentation operates differently in different domains: Perturbations in the image domain can be broadly classified into two categories: spatial/geometric perturbations and appearance/color perturbations. Ways of spatial/geometric perturbation include but are not limited to flip, rotation, cutout, crop and resize. Appearance disturbances include but are not limited to color distortion, Gaussian noise, etc.

insert image description here
        Perturbations in the field of natural language can also be roughly divided into two categories: token-level and embedding-level. Word-level disturbances generally include sentence clipping (crop), deletion of words/word blocks (span), reordering, and synonym replacement. The level of perturbation includes adding Gaussian noise, dropout, etc.

 insert image description here

        In addition, contrastive learning has a variety of special loss functions applied to different fields:

        1. Raw Contrastive Loss

        2. Triple loss

        3. Loss of InfoNCE

        For details, please refer to: Comparative Study (1) Introduction_Baiyi Xishu Plum Wine Blog-CSDN Blog_Comparative Study

Summarize

        Contrastive learning is widely used and can be used in many fields such as CV, recommendation system, and natural language processing. It can effectively solve the problem of difficult and inaccurate network training with insufficient labeled data.

Reference video link : Self-supervised learning and contrastive learning in NLP_哔哩哔哩_bilibili

[Thesis Notes Contrastive Learning] 05 Contrastive Learning of Structured World Models_哔哩哔哩_bilibili

Comparative Learning (1) Introduction_Baiyi Xishu Plum Wine Blog-CSDN Blog_Comparative Learning

Guess you like

Origin blog.csdn.net/qq_46006468/article/details/126066506