MMdetection Framework Crash Course Part 07: N Methods of Data Enhancement

1 Why data augmentation is necessary

As we all know, even the most advanced neural network models currently use a series of linear and nonlinear functions to fit the target output. Since it is fitting, of course, the more samples, the more accurate results can be obtained, which is why the scale of data used to train neural networks is getting larger and larger.
Insert image description here

However, in actual use, we may often only have thousands or even hundreds of data. Facing the millions of parameters of the neural network, it is easy to fall into the trap of overfitting. Because the convergence of the neural network requires a long training process, and during this process, the network encounters the same few pictures in the training set over and over again, and has memorized them all. Naturally, it is difficult to learn what can be generalized. Characteristics. A natural thought is, can we use one image to generate a series of images, thereby expanding our data set hundreds or thousands of times? And this is one of the purposes of data enhancement.

The "tricks" of neural networks - neural networks have no common sense, so they will always only distinguish between two categories in the most "convenient" way.

Suppose you want to train a neural network to distinguish apples and oranges, but the data you have is only red apples and green oranges. No matter how many photos you take, the neural network will simply think that red ones are apples and cyan ones are oranges. This often occurs in actual use. The lighting, shooting angle, etc., any inconspicuous distinguishing point will be used as the basis for classification by the neural network.

Next, a series of data enhancement methods and effects most commonly used in current classification research will be listed.

2 Common misunderstandings about data enhancement

Some people will think that since there are so many data enhancement methods, can I get the best enhancement effect by stacking them all together in one go? The answer is no. The goal of data enhancement is not to pile up data mindlessly, but to cover as much as possible the situations that cannot be covered by the original data but will occur in real life.

For example: Now we want to train a neural network to distinguish the types of cars on the road, so vertical flipping of the picture is not a good data enhancement method to a large extent. After all, it is unlikely to encounter a car with all wheels up in reality. Condition.

Insert image description here

3 Common Data Enhancement Methods

3.1 Random Flip

Random flipping is a very common data augmentation method, including horizontal and vertical flipping. Among them, horizontal flip is the most commonly used, but depending on the actual target, vertical flip can also be used.

Insert image description here

In MMClassification, most data enhancement methods can be implemented by modifying the pipeline configuration in config. Here we provide a python code to demonstrate the data enhancement effect as shown above:

import mmcvfrom mmcls.datasets
import PIPELINES

# 数据增强配置,利用 Registry 机制创建数据增强对象
aug_cfg = dict(
    type='RandomFlip',
    flip_prob=0.5,           # 按 50% 的概率随机翻转图像
    direction='horizontal',  # 翻转方向为水平翻转
)
aug = PIPELINES.build(aug_cfg)

img = mmcv.imread("./kittens.jpg")
# 为了便于信息在预处理函数之间传递,数据增强项的输入和输出都是字典
img_info = {
    
    'img': img}
img_aug = aug(img_info)['img']

mmcv.imshow(img_aug)

3.2 Random cropping (RandomCrop)

Crop the image to the specified size at a random location. This data enhancement method can move the position of each area in the image on the basis of retaining the image proportions.

Insert image description here

In MMClassification, the following configurations are available:

# 此处只提供 cfg 选项,只需替换 RandomFlip 示例中对应部分,即可预览效果
aug_cfg = dict(
    type='RandomCrop',
    size=(384, 384),           # 裁剪大小
    padding = None,            # 边缘填充宽度(None 为不填充)
    pad_if_needed=True,        # 如果图片过小,是否自动填充边缘
    pad_val=(128, 128, 128),   # 边缘填充像素
    padding_mode='constant',   # 边缘填充模式
)

3.3 Randomly crop and scale (RandomResizedCrop)

This method is currently almost a standard enhancement method for general image data sets such as ImageNet when training classification networks. Compared with RandomCrop, which rigidly crops an image of a fixed size, RandomResizedCrop will crop the image at a random position and a random proportion within a certain range, and then scale it to a uniform size.

Therefore, the image will have some degree of distortion in proportion. But this is not necessarily a bad thing for classification. After all, you will not recognize a slightly flatter cat as a dog, and the network can also learn features closer to the essence through this enhancement. In addition, because it is proportionally cropped, this enhancement method is more friendly to input images of different resolutions.

Insert image description here

In MMClassification, the following configurations are available:

# 此处只提供 cfg 选项,只需替换 RandomFlip 示例中对应部分,即可预览效果
aug_cfg = dict(
    type='RandomResizedCrop',
    size=(384, 384),            # 目标大小
    scale=(0.08, 1.0),          # 裁剪图片面积占比限制(不得小于原始面积的 8%)
    ratio=(3. / 4., 4. / 3.),   # 裁剪图片长宽比例限制,防止过度失真
    max_attempts=10,            # 当长宽比和面积限制无法同时满足时,最大重试次数
    interpolation='bilinear',   # 图像缩放算法
    backend='cv2',              # 缩放后端,有时 'cv2'(OpenCV) 和 'pillow' 有微小差别
)

3.4 Color Jitter (ColorJitter)

The most commonly used method for data enhancement of the color of an image is ColorJitter. This method will enhance the brightness (Brightness), contrast (Contrast), saturation (Saturation) and hue (Hue) of the image within a certain range. Perform random transformations to simulate changes in different lighting environments and other conditions in real shooting.

Insert image description here

In MMClassification, the following configurations are available:

# 此处只提供 cfg 选项,只需替换 RandomFlip 示例中对应部分,即可预览效果
aug_cfg = dict(
    type='ColorJitter',
    brightness=0.5,    # 亮度变化范围(0.5 ~ 1.5)
    contrast=0.5,      # 对比度变化范围(0.5 ~ 1.5)
    saturation=0.5,    # 饱和度变化范围(0.5 ~ 1.5)
    # 色相变换应用较少,目前 MMClassification 暂不支持 Hue 的增强
)

3.5 RandomGrayscale

According to a certain probability, the image is converted into a grayscale image. This enhancement method eliminates the influence of color and is applicable in specific scenes.

Insert image description here

In MMClassification, the following configurations are available:

# 此处只提供 cfg 选项,只需替换 RandomFlip 示例中对应部分,即可预览效果
aug_cfg = dict(
    type='RandomGrayscale',
    gray_prob=0.5,    # 按 50% 的概率随机灰度化图像
)

3.6 Random lighting transformation (Lighting)

The paper proposes a data enhancement method for image lighting, its source: https://dl.acm.org/doi/pdf/10.1145/3065386

In this method, PCA (Principal Component Analysis) is first performed on the pixels of all images in the training dataset to obtain the eigenvalues ​​and eigenvectors in RGB space. So what does this feature vector represent? Here, the author of the paper believes that it represents the impact of light intensity on image pixels. After all, although the image content is diverse, no matter where in the picture, it is inevitably affected by the lighting conditions.

Since the feature vector represents the influence of light intensity, you can simulate images with different lighting by doing some random additions and subtractions to the pixel values ​​of the image along the direction of the feature vector.

Insert image description here

In MMClassification, the following configurations are available. What needs to be noted is the setting of eigenvalues ​​and eigenvectors. If your task is classification in a general scene, you can directly use the ImageNet values; and if your task is in a special lighting environment, then you need to collect For images under different illumination intensities, perform PCA on your own data set to replace the settings here.

import mmcvfrom mmcls.datasets
import PIPELINES

aug_cfg = dict(
    type='Lighting',
    eigval=[55.4625, 4.7940, 1.1475],      # 在 ImageNet 训练集 PCA 获得的特征值
    eigvec=[[-0.5675, 0.7192, 0.4009],     # 在 ImageNet 训练集 PCA 获得的特征向量
            [-0.5808, -0.0045, -0.8140],
            [-0.5836, -0.6948, 0.4203]],
    alphastd=2.0,  # 随机变换幅度,为了展示效果,这里设置较大,通常设置为 0.1
    to_rgb=True,   # 是否将图像转换为 RGB,mmcv 读取图像为 BGR 格式,为了与特征向量对应,此处转为 RGB
)
aug = PIPELINES.build(aug_cfg)

img = mmcv.imread("./kittens.jpg")
img_info = {
    
    'img': img}
img_aug = aug(img_info)['img']
# Lighting 变换得到的图像为 float32 类型,且超出 0~255 范围,为了可视化,此处进行限制
img_aug[img_aug < 0] = 0
img_aug[img_aug > 255] = 255
img_aug = img_aug.astype('uint8')[:, :, ::-1]   # 转回 BGR 格式

mmcv.imshow(img_aug)

The data enhancement methods introduced above are only part of the commonly used methods. There are more data enhancement methods, such as random combination of multiple methods (AutoAugment, RandAugment), mixed enhancement of multiple pictures (MixUp, CutMix), etc.

Guess you like

Origin blog.csdn.net/qq_39237205/article/details/131968406