[Deep learning] A list of data augmentation technologies based on deep learning

I was bored at home on weekends. I accidentally wanted to make a simple survey and summary of the commonly used data augmentation methods in the CV field, focusing on several emerging methods in the AI era with good responses, various third-party and official implementation codes, etc. So today, Happy will talk to you about the commonly used data augmentation methods in deep learning.

In image classification tasks, data augmentation is a commonly used regularization method, and it has become an indispensable step to improve model performance. From AlexNet, which led the AI craze, to EfficientNet recently, data augmentation can be seen. Data augmentation methods have also gradually transitioned from traditional cropping, rotation, and mirroring methods to the current hot AutoAug, RandAug and other data augmentation methods based on NAS search.

Take the code in pytorch's official ImageNet training as an example, as shown below. It basically contains several key operations and nodes for data augmentation in CV: data analysis, data size transformation, mirror transformation, color space transformation, ToTensor, normalization, and so on.

train_dataset = datasets.ImageFolder(        traindir,        transforms.Compose([            transforms.RandomResizedCrop(224),            transforms.RandomHorizontalFlip(),            transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),            transforms.ToTensor(),            normalize,            transforms.RandomErasing()        ]))

In order to better introduce and explain the data augmentation methods, we can simply classify the data augmentation methods according to their location and mode of action as follows:

Standard data augmentation: generally refers to some commonly used data augmentation methods in the early or early stage of deep learning.
- Data IO; including ToTensor, ToPILImage, PILToTensor, etc.
- Image mirroring class: including RandomHorizontalFlip, RandomVerticalFlip, RandomRotation, Transpose, etc.
- Color space transformation: ColoeJitter, Grayscale,
- Image cropping and scaling: Resize, CenterCrop, RandomResizedCrop, TenCrop, FiveCrop, etc.
- Image linear transformation class: LinearTransform, RandomRotation, RandomPerspective
Image transformation category: refers to a set of transformation combinations based on NAS search, including AutoAugment, RandAugment, Fast AutoAugment, Faster AutoAugment, Greedy Augment, etc.;
Image cropping category: refers to some data augmentation methods similar to dropout proposed in the deep learning era, including CutOut, RandErasing, HideAndSeek, GridMask, etc.;
Image aliasing category: refers to operations performed at the batch level, including Mixup, Cutmix, Fmix, etc.

Standard data augmentation

This article mainly introduces ImageNet classification as an example, and assumes that the dimension of the data finally input to the network is. During the training phase of ImageNet, the data augmentation method can be divided into the following steps:

Image decoding: The commonly used decoding methods are opencv and PIL. It should be noted that the data arrangement after both is different: one is BGR and the other is RGB. Although different decoding methods in the inference stage will not greatly affect the performance of the model, it will still affect the model after all, and there will probably be a few points of difference;
RandCrop/RandResize: Crop/Resize the aforementioned decoded image to ensure the uniform size of the image input to the subsequent stage;
RandFlip: In the classification task, the commonly used mirror is a horizontal mirror;
Normalize: Data normalization, the data type processed by this step is transformed from uint8 to float;
Transpose: Data retake, transform the input data dimension from to;
Batch: Its function is to finalize the aforementioned processing data, which is generally provided by a deep learning framework.

On the basis of the aforementioned standard data augmentation, researchers have proposed many improved augmentation strategies. These strategies are mainly inserted into the above-mentioned different operations. Based on the stage, we can simply divide them into three categories:

Operations performed on the image after RandCrop: such as AutoAugment, RandAugment;
Operations performed on the image after Transpose: such as CutOut, RandErasing, HideAndSeek, GridMask;
Operations performed on the data after the batch: such as Mixup, Cutmix

In order to better illustrate and compare the above methods, the following figure is taken as an example to visually compare the effect of the transformation (Note: the image source network, if infringement can be contacted and deleted).

Image transformation

Here, the image transformation refers to RandCropsome transformations performed on the resulting image, mainly including: AutoAugment, Fast AutoAugment, and RandAugment. Of course there are other methods, such as Faster AutoAugmeng, PBA, etc. However, due to space limitations, here is only a brief introduction and effect visualization of the two most well-known (AutoAugment and RandAugment).

AutoAugment

paper：https://arxiv.org/abs/1805.09501v1
code：https://github.com/DeepVoltaire/AutoAugment

Before AutoAugment, the data augmentation methods used in image classification, target detection, image restoration, and semantic segmentation were all manually designed, mainly based on traditional methods. And AutoAugment is the first method to use search technology for data augmentation.

AutoAugment is an image augmentation scheme suitable for a specific data set found by a search algorithm in the search space of a series of image augmentation sub-strategies. For the ImageNet data set, the finally searched data augmentation plan contains 25 sub-strategy combinations, and each sub-strategy contains two transformations: randomly select a sub-strategy combination for each image, and then decide whether or not with a certain probability Perform each transformation in the sub-strategy. The effect of AutoAugment processing is shown below.

RandAugment

paper: https://arxiv.org/pdf/1909.13719.pdf
code: https://github.com/heartInsert/randaugment

Because AutoAugment the search method is more violent, the optimal strategy for the data set is directly searched on the data set, which leads to a large amount of calculation. RandAugment The author of this article believes that AutoAugment has the following two flaws:

The performance gain of AutoAugment is limited on large data sets;
Due to the strong correlation of AutoAugment data, its migration capability is relatively poor

In RandAugment , the author proposes a random augmentation method, no longer AutoAugment uses a specific probability to determine whether to use a certain sub-strategy as in the middle, but all sub-strategies will be selected with the same probability. The experiment in the paper is also It shows that this data augmentation method has a good effect even in the training of large models. The effect of RandAugment treatment is shown below.

Note: In addition to the above-mentioned two data augmentation methods based on search, there are other NAS-based data augmentation methods, such as:

Fast AutoAugment：https://arxiv.org/abs/1905.00397
Faster AutoAugment：https://arxiv.org/abs/1911.06987
PBA：https://arxiv.org/pdf/1905.05393.pdf
Greedy AutoAugment：https://arxiv.org/abs/1908.00704

Image cropping class

The image cropping class is mainly to Transpose perform some cropping on the 224 image, and set the pixel value of the cropped area to a specific constant (default is 0), mainly including:

Cutout
RandErasing
HideAndSeek
GridMask

Cutout

Paper: https://arxiv.org/abs/1708.04552
Code: https://github.com/uoguelph-mlrg/Cutout

To some extent, Cutout can be understood as an extended operation of Dropout. The difference is that Dropout occludes the features generated after the image passes through the network, while Cutout directly occludes the input image. The effect after Cutout processing is shown below.

RandErasing

Paper: https://arxiv.org/pdf/1708.04896.pdf
Code: https://github.com/zhunzhong07/Random-Erasing

RandomErasingCutout Similar to the method, and also to solve the problem of poor generalization ability of the trained model on occluded data, the author also pointed out in the paper that the random cropping method and random horizontal flip have a certain complementarity. The Cutoutdifference is that in RandomErasing, the picture accepts this preprocessing method with a certain probability, and the size and aspect ratio of the generated mask are also randomly generated according to the preset hyperparameters. Note: In the author's impression, RandErasing will also add noise, which improves the robustness of the model to a certain extent.

HideAndSeek

Paper: https://arxiv.org/pdf/1811.02545.pdf
Code: https://github.com/kkanshul/Hide-and-Seek

HideAndSeekThe paper divides the image into several patches, and for each patch, a mask is generated with a certain probability. The effect after HideAndSeek processing is shown below.

Gridmask

Paper: https://arxiv.org/abs/2001.04086
Code: https://github.com/akuxcw/GridMask

This article is the result of data augmentation by the Hong Kong Chinese teacher Jia Jiaya team. The author believes that the previous Cutout method may (1) delete the subject's target information over time, resulting in excessive loss of information; (2) delete the subject's target information less frequently. Loss of augmented meaning. Based on the above considerations, the author proposed the GridMask method. GridMaskBy generating a mask with the same resolution as the original image, flipping the mask randomly, and multiplying it with the original image to obtain the augmented image, the size of the generated mask grid is controlled by hyperparameters. Note: The masks in GridMask are distributed regularly, and the subject target will not be deleted excessively. The effect after GridMask processing is as follows.

Image aliasing

Image aliasing mainly mixes the Batch subsequent data. It should be noted that this type of data augmentation method not only adjusts the input, but also adjusts the label and the loss function . Such methods mainly include the following two:

Mixup
Cutmix

The above-mentioned image transformation and image cropping are all operations performed on a single image, while image aliasing is the fusion of two images to generate one image. The main difference between the two methods is the way of aliasing is not the same.

Mixup

Paper: https://arxiv.org/pdf/1710.09412.pdf
Code: https://github.com/facebookresearch/mixup-cifar10

Mixup is the earliest proposed image aliasing augmentation scheme, which is to alias two different images through blending, and the label also needs to be aliased. It has two implementation methods: (1) post-aliasing within the same batch; (2) aliasing of different batches. The figure below shows the effect of aliasing in the same batch.

Cutmix

Paper: https://arxiv.org/pdf/1905.04899v2.pdf
Code: https://github.com/clovaai/CutMix-PyTorch

It Mixup is not the same as adding two images directly, which Cutmix is to randomly crop one ROIimage from one image, and then cover the corresponding area in the current image.

Note: In addition to the mentioned Mixup and Cutmix, there are many other related methods, such as Fmix, see the link: https://github.com/ecs-vlc/FMix,

Conclusion

This article takes data augmentation in the CV field as a starting point, and introduces several more classic data augmentation methods. In addition to the data augmentation methods described above, there are other data augmentation methods, such as the following:

Cutblur: A data augmentation method for image super-division, https://arxiv.org/abs/2004.00448
Attribute Mix: Data augmentation method for fine-grained recognition, https://arxiv.org/abs/2004.02684
DADA: Data augmentation method based on Low Data Regime classification, https://arxiv.org/abs/1809.00981
Supermix: Data augmentation method for knowledge distillation, https://arxiv.org/abs/2003.05034
BayerAug: A method for raw data augmentation, https://arxiv.org/abs/1904.12945

In addition to the data augmentation methods mentioned above, here are some good data augmentation libraries for CVers. The related links are as follows:

albumentations: This library contains a large number of traditional image data augmentation methods, link: https://github.com/albumentations-team/albumentations
UDA: Unsupervised data augmentation, link: https://github.com/google-research/uda
torchsample: high-level package based on pytorch, including data augmentation model training, etc., link: https://github.com/ncullen93/torchsample
image_augmentor: Another traditional image data augmentation method, link: https://github.com/codebox/image_augmentor
imgaug: A data augmentation library suitable for classification and detection. https://github.com/aleju/imgaug 1.vidaug: Video data augmentation method, link: https://github.com/okankop/vidaug
pytorch-official data augmentation: https://github.com/pytorch/vision/tree/master/torchvision/transforms
Dr. Zhang Hang's open source FastAutoAugment: https://github.com/zhanghang1989/Fast-AutoAug-Torch
The official implementation of FastAutoAugment: https://github.com/kakaobrain/fast-autoaugment
The official implementation of AutoAugment: https://github.com/tensorflow/models/tree/master/research/autoaugment
Paddle official data augmentation: https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/data/imaug

The above is my summary of data augmentation today, and from this I found some data augmentation related methods and code bases. The search is definitely not perfect. If there are other excellent data augmentation methods, you can leave a message to supplement and improve .

Reference blog: [Deep learning] A list of data augmentation technologies based on deep learning