[Paper Notes] Data Enhancement Series.1

This article introduces simple data enhancement, benefits and common enhancement methods, and also introduces several works on data enhancement:

CutMix(ICCV2019),ContrastMask(CVPR2022),BCP(CVPR2023)。

Data Augmentation Introduction & Benefits

What is data augmentation?

Data augmentation is a technique in deep learning that augments the original dataset by generating new training data from existing data. Data augmentation tools transform data into new, unique samples by manipulating the parameters of existing data. Data augmentation can be performed on image, text, audio and video inputs. There are two types of data augmentation: offline (augmented images are stored on drive and then combined with real data before training the model) and online (data augmentation is applied to randomly selected images and used to train the original data).

What are the benefits of data augmentation?

Proper use of data augmentation can bring the following benefits:

  • Reduce the cost of data acquisition and data labeling.
  • Improve model generalization by giving the model more diversity and flexibility.
  • Improves the model's accuracy in predictions because it uses more data to train the model.
  • Reduce overfitting of data.
  • Deals with imbalances in the dataset by increasing the number of samples in the minority class.

Common data enhancement methods:

To learn more about data enhancement methods, please refer to the blog:

Various data enhancements in deep learning_m0_61899108's Blog-CSDN Blog

Automatic data enhancement method (with code)_data enhancement code_m0_61899108's blog-CSDN blog

There are many methods of data enhancement, and the algorithm is not difficult. The difficulty lies in how to understand, the reason and purpose (motivation) of the method, whether the method is simple and effective, how to relate to the task, and how to tell a good story.

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features, ICCV 2019

Paper: https://arxiv.org/abs/1905.04899

Code: https://github.com/clovaai/CutMix-PyTorch

Comparison of several common data enhancements:

  • Mixup : Mix two random samples in proportion, and the classification results are distributed in proportion;
  • Cutout : Randomly cut out some areas in the sample, and fill in the 0 pixel value, and the classification result remains unchanged;
  • CutMix : It is to cut out a part of the area but not fill 0 pixels but randomly fill the area pixel values ​​​​of other data in the training set, and the classification results are distributed according to a certain ratio.

 The difference between the above three data enhancements:

  • cutout and cutmix are the difference between the pixel values ​​of the filled area;
  • mixup and cutmix are the difference in the way of mixing two samples:
  • mixup is to interpolate two images in proportion to mix samples, and cutmix is ​​to mix images in the form of cut parts and patches, so that there will be no unnatural situation after image mixing.

CutMix Advantages:

  • No non-informative pixels will appear during the training process, which can improve the training efficiency;
  • It retains the advantages of regional dropout and can focus on the non-discriminative parts of the target;
  • By requiring the model to recognize objects from a partial view and adding information of other samples to the cut area, the positioning ability of the model can be further enhanced;
  • There will be no unnatural situation after image mixing, which can improve the performance of model classification;
  • Training and inference costs remain the same.

algorithm:

 

pseudocode: 

experiment:

 

ContrastMask: Contrastive Learning to Segment Every Thing, CVPR2022

Paper: https://arxiv.org/abs/2203.09775

Code: https://github.com/huiserwang/ContrastMask

Partially supervised instance segmentation is a task that requires segmenting objects from novel categories by learning on a limited set of base categories with annotation masks , thereby removing the heavy annotation burden. The key to solving this task is to build an efficient class-agnostic mask segmentation model . Different from previous methods that only learn such models on base categories, this paper proposes a new method called ContrastMask , which learns masks on base categories and new categories under a unified pixel-level contrastive learning framework. Split the model. In this framework, annotation masks of base classes and pseudo-masks of novel classes are used as priors for contrastive learning, where features from masked regions (foreground) are pulled together and contrasted with features from background, vice versa. (Sampling the query and keys between the foreground and background pixels of the instance, thereby shortening the distance between the foreground-background and shortening the distance between the foreground-foreground or the background-background) Through this framework, the distance between the foreground and the background is greatly improved. feature distinction, facilitating the learning of class-agnostic mask segmentation models. Achieved good results on the COCO dataset.

In this paper, we propose ContrastMask, a novel partially supervised instance segmentation approach that learns class-agnostic mask segmentation models on both base and novel categories under a unified pixel-wise contrastive learning framework. In this framework, a novel query-shared pixel-level contrastive loss is designed to fully exploit data from all categories. To this end, annotation masks of base categories or pseudo-masks of new categories computed by Class Activation Mapping (CAM) are used as region priors, which indicate not only foreground and background separation, but also shared query, positive and negative secrets. key. Instead, given a batch of training images containing both base and novel categories, two shared queries are established: a foreground query and a background query, which are obtained by averaging features inside and outside the masked region, including Annotation masks and dummy masks. Then, a special sampling strategy is implemented to select an appropriate key. By introducing the proposed loss, we expect to pull the keys inside/outside the masked region towards the foreground/background shared query and contrast it with the keys outside/in the masked region. Finally, the features learned by our pixel-level contrastive learning framework are fused into a class-agnostic mask head to perform mask segmentation.

Compared with previous methods, ContrastMask has several benefits:

  • It makes full use of the training data, so that the training data from the new category also contributes to the optimization process of the segmentation model;
  • More importantly, it builds a bridge to transfer the segmentation ability of the basic category to the new category through a unified pixel-level contrastive learning framework, especially the shared query of the basic category and the new category, so as to continuously improve the basic category and the novel category. Characteristic distinction between foreground and background of categories.

Framework: ContrastMask builds on the classic two-stage Mask R-CNN architecture with an additional "contrastive learning" head called CL Head, which performs unified pixel-level contrastive learning on both base and new categories. CL Head takes RoI feature map and CAM generated by Box Head as input. It is supervised by a pixel-wise contrastive loss and outputs an augmented feature map of the mask head. Finally, Mask Head predicts a class by taking the fused feature maps as input agnostic segmentation maps.

Contrastive Learning Head (CL Head): The goal of CL Head is to increase the feature distinction between foreground and background, and reduce the feature difference within each region (background or foreground) of the base category and the new category, thus contributing to Mask Head study. This is achieved by learning a novel pixel-level contrastive loss.

Figure 3. Flowchart of the contrastive learning head (CL head) consisting of an encoder and a projector, supervised by a pixel-wise contrastive loss. Contrastive loss is computed using the true label mask (if base mask) or the fake mask transformed from CAM (if novel mask)

Query-sharing Pixel-level Contrastive Loss: A new pixel-level loss that can learn mask segmentation models for base categories and new categories under a unified contrastive learning framework. The core design idea of ​​this loss function is that the basic category and the new category share two category-independent queries, one for the front q+ and the other for the background q−, thus forming a bridge to subdivide the basic category Capabilities are transferred to new categories.

Figure 4. Diagram to illustrate how to obtain query and example keys. For base categories, we use the ground truth mask for segmentation and extract edges to guide sampling hard keys. For new classes, we first binarize the CAM by a threshold δ, then partition, and randomly sample easy and hard keys based on the partition. The foreground query q+ and background query q− are obtained by averaging the features of the corresponding partitions proposed by a batch of objects.

 

 Class-agnostic mask head: The architecture and corresponding loss function of the mask head are the same as those in mask R-CNN, except for three modifications: 1) Change the output channel of the last convolutional layer from 80 to 1, resulting in a class-agnostic mask header. 2) Concatenate the output feature maps of the CL head with the input feature maps of the mask head, which makes the input features of the mask head more unique and facilitates its learning. 3) Use CAM to tell the mask head which area to focus on. This can be easily achieved by adding CAM to the input feature map.

Figure 5. The input of the class-agnostic mask header consists of the enhanced featuer map Y, the RoI feature map X and the CAM A

 experiment:

 

Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation, CVPR2023

Paper: https://arxiv.org/abs/2305.00673

Code: https://github.com/DeepMed-Lab-ECNU/BCP

In semi-supervised medical image segmentation, there is an empirical mismatch problem between labeled and unlabeled data distributions. If labeled and unlabeled data are treated separately or in an inconsistent manner, knowledge learned from labeled data may be largely discarded.

This paper proposes a simple approach to alleviate this problem - bidirectional copy-pasting of labeled and unlabeled data in a simple Mean Teacher architecture. The method encourages unlabeled data to learn inward and outward comprehensive general semantics from labeled data. More importantly, the consistent learning process of labeled and unlabeled data can narrow the empirical distribution gap to a large extent.

In detail, random crops are copy-pasted from labeled images (foreground) to unlabeled images (background) and unlabeled images, respectively. These two blended images are fed into the student network and supervised by a blended supervision signal of pseudo and real labels. The paper finds that a simple mechanism for bidirectional copy-paste between labeled and unlabeled data works well enough, and shows significant experimental gains compared to other state-of-the-art techniques on various semi-supervised medical image segmentation datasets).

figure 1. Illustration of the mismatch problem in the semi-supervised learning setting. Suppose the training set is drawn from the latent distribution in (a). But the empirical distributions for a small amount of labeled data and a large amount of unlabeled data are (b) and (c), respectively. It is difficult to construct an accurate distribution of the entire dataset using very little labeled data. (d) By using our BCP, the empirical distributions of labeled and unlabeled features are consistent. (e) Other methods, such as SSNet [35] or cross unlabeled data copy-paste, cannot address empirical distribution mismatch. All distributions are kernel density estimates of voxels in ACDC belonging to the myocardium class. 

 

In semi-supervised medical image segmentation, labeled and unlabeled data come from the same distribution (Fig. 1(a)). But in the real world, it is difficult to estimate exact distributions from labeled data because of their small number. Therefore, there is always an empirical distribution mismatch between the large amount of unlabeled data and the very small amount of labeled data (Fig. 1(b) and (c)). Semi-supervised segmentation methods always try to train on labeled and unlabeled data symmetrically in a consistent manner. But most existing semi-supervised methods use labeled and unlabeled data under different learning paradigms. As such, it often results in discarding a large amount of knowledge learned from labeled data, as well as a mismatch in the empirical distribution between labeled and unlabeled data (Fig. 1(e)).

To alleviate the empirical mismatch problem between labeled and unlabeled data, a successful design is to encourage unlabeled data to learn comprehensive commonalities from labeled data, while further realizing the distribution through consistent learning strategies for labeled and unlabeled data. align. This paper achieves this by proposing a simple yet very effective bidirectional copy-paste (BCP) method instantiated in the Mean Teacher framework. Specifically, to train the student network, the input is augmented by copy-pasting random crops from labeled images (foreground) onto unlabeled images (background), and vice versa by copy-pasting random crops from unlabeled images (foreground) onto labeled images (background). The student network is supervised by the generated supervision signal via bi-directional copy-paste between pseudo-labels for unlabeled images from the teacher network and label maps for labeled images. These two blended images help the network to learn common semantics between labeled and unlabeled data bidirectionally and symmetrically.

frame:

image 3. Overview of the bidirectional copy-paste framework in the Mean Teacher architecture, using 2D input images for better visualization. The input to the student network is generated by mixing two labeled and two unlabeled images in the proposed bidirectional copy-paste manner. Then, to provide a supervision signal to the student network, we combine the real and pseudo-labels generated by the teacher network into one supervision signal by the same bidirectional copy-paste to achieve strong supervision from real labels and help weak supervision from pseudo-labels.

Process description: 

 

algorithm:

 

 

 

 

experiment:

Guess you like

Origin blog.csdn.net/m0_61899108/article/details/130702251