Image transformation and custom transforms for data enhancement

Articles and codes have been archived in [Github warehouse: https://github.com/timerring/dive-into-AI ] or the public account [AIShareLab] can also be obtained by replying to the pytorch tutorial .

Article Directory

torchvision.transforms.Pad

torchvision.transforms.Pad(padding, fill=0, padding_mode='constant')

Function: fill the edge of the image

padding: set padding size
- When it is a, a pixel is filled up, down, left, and right
- When it is (a, b), fill a pixels left and right, fill b pixels up and down
- When it is (a, b, c, d), fill a, b, c, d in the upper left and lower right respectively
- padding_mode: padding mode, there are 4 modes, constant, edge, reflect, symmetric
- fill: When padding_mode is constant, set the filled pixel value, (R, G, B) or (Gray)

torchvision.transforms.ColorJitter

torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)

Function: Adjust Brightness, Contrast, Saturation, Hue. In the process of taking photos, there may be color deviations due to equipment and lighting problems. Therefore, these attributes need to be adjusted to offset the disturbance caused by these factors.

brightness: brightness adjustment factor
contrast: contrast parameter
saturation: saturation parameter
brightness, contrast, saturation parameters:
- When it is a, randomly select from [max(0, 1-a), 1+a];
- When (a, b), select from [a, b].
hue: hue parameter
- When a, select the parameter from [-a, a]. where $0\le a \le 0.5$ 。
- When (a, b), select parameters from [a, b]. where $\le a \le b \le 0.5$ 。

transforms.Grayscale(RandomGrayscale)

torchvision.transforms.Grayscale(num_output_channels=1)

Function: convert image to grayscale image

num_output_channels: The number of output channels. It can only be set to 1 or 3 (if it is used later transforms.Normalize, it should be set to 3, because transforms.Normalizeit can only receive input from 3 channels)
Grayscale is a special case of RandomGrayscale, that is, a special case of p = 1.

torchvision.transforms.RandomGrayscale(p=0.1, num_output_channels=1)

p: probability value, the probability that the image is converted to a grayscale image
num_output_channels: The number of output channels. Can only be set to 1 or 3

Function: Convert the image to a grayscale image according to a certain probability.

transforms.RandomAffine

torchvision.transforms.RandomAffine(degrees, translate=None, scale=None, shear=None, resample=False, fillcolor=0)

Function: Perform affine transformation on the image. Affine transformation is a 2-dimensional linear transformation, which consists of five basic operations, namely rotation, translation, scaling, staggering and flipping.

degree: rotation angle setting
translate: Translation interval settings, such as (a, b), a sets the width (width), b sets the height (height). The interval of image translation in wide dimension is $img_{width} \times a < dx < img_{width} \times a$ , high in the same manner.
scale: scale, in area
fillcolor: fill color setting
shear: shear angle setting, there are horizontal shear and vertical shear. The meaning of miscutting: For example, horizontal side cutting (x-axis side cutting), keeping the x-axis of the picture parallel, and pulling the y-axis of the picture obliquely, so that the whole picture resembles a parallelogram.
- If it is a, only miscut on the x-axis (keep the x-axis parallel), randomly select the miscut angle between (-a, a)
- If it is (a, b), the x-axis randomly selects the miscut angle between (-a, a), and the y-axis randomly selects the miscut angle between (-b, b)
- If it is (a, b, c, d), the x-axis randomly selects the miscutting angle between (a, b), and the y-axis randomly selects the miscutting angle between (c, d)
resample: Resampling methods, including NEAREST, BILINEAR, BICUBIC.

transforms.RandomErasing

torchvision.transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3), value=0, inplace=False)

The above parameters are better ranges given in the paper.

Function: Randomly occlude the image. The input to this operation is a tensor. So it needs to be executed before that transforms.ToTensor(). At the same time, comment out the following transforms.ToTensor().

p: probability value, the probability of performing the operation
scale: The area of the occluded area. Such as (a, b), an occlusion ratio in (a, b) will be randomly selected
ratio: The aspect ratio of the occlusion area. Such as (a, b), an aspect ratio in (a, b) will be randomly selected
value: Set the pixel value of the occluded area. (R, G, B) or Gray, or any string. Due to the previous execution transforms.ToTensor(), the pixel value is normalized to between 0 and 1, so the (R, G, B) set here should be divided by 255

transforms.RandomErasing(p=1, scale=(0.02, 0.33), ratio=(0.3, 3.3), value=(254/255, 0, 0))The effect of is as follows scale=(0.02, 0.33), randomly select the ratio of the occlusion area, ratio=(0.3, 3.3)randomly select the aspect ratio of an occlusion area, and the RGB value set by value needs to be normalized to between 0 and 1.

transforms.RandomErasing(p=1, scale=(0.02, 0.33), ratio=(0.3, 3.3), value='timerring')The effect is as follows, if value is set to any string , random values will be used to fill the occluded area.

transforms.Lambda

transforms . Lambda ( lambd)lambd: Lambda anonymous function.

Function: Customize the transform method .

lambda [arg1 [arg2, ... , argn]] : expressionBefore the colon is the input parameter, followed by the processed expression, which is similar to the meaning of return.

FiveCropFor example, it is used in the above transforms.Lambda.
transforms.FiveCrop(112, vertical_flip=False),
transforms.Lambda(lambda crops: torch.stack([(transforms.ToTensor()(crop)) for crop in crops]))
transforms.FiveCropThe return is a tuple with a length of 5, so you need to use transforms.Lambdaa tensor to convert the tuple to 4D.

Combination and selection of transforms

torchvision.transforms.RandomChoice

torchvision.transforms.RandomChoice([transforms1, transforms2, transforms3])

Function: Randomly select one from a series of transforms methods

transforms.RandomApply

torchvision.transforms.RandomApply([transforms1, transforms2, transforms3], p=0.5)

Function: Execute a set of transforms operations according to the probability, either all of them are executed, or none of them are executed.

transforms.RandomOrder

transforms.RandomOrder([transforms1, transforms2, transforms3])

Shuffles the order of operations on a set of transforms.

custom transforms

Two elements of custom transforms:

Accepts only one parameter and returns one parameter;
Pay attention to the input and output of the upstream and downstream, the output of the previous transform is the input of the next transform.

Implement salt and pepper noise. Salt and pepper noise, also known as impulse noise, is a random white point or black point, the white point is called salt noise, and the black point is called pepper noise. Signal-to-noise ratio (Signal-Noise Rate, SNR) is a measure of the proportion of noise, the proportion of normal pixels in the image to all pixels.

Define a AddPepperNoiseclass as a transform for adding salt and pepper noise. The signal-to-noise ratio and probability are passed in to the constructor, __call__()the specific logic is executed in the function, and the image is returned.

import numpy as np
import random
from PIL import Image

# 自定义添加椒盐噪声的 transform
class AddPepperNoise(object):
    """增加椒盐噪声
    Args:
        snr （float）: Signal Noise Rate
        p (float): 概率值，依概率执行该操作
    """

    def __init__(self, snr, p=0.9):
        assert isinstance(snr, float) or (isinstance(p, float))
        self.snr = snr
        self.p = p

    # transform 会调用该方法
    def __call__(self, img):
        """
        Args:
            img (PIL Image): PIL Image
        Returns:
            PIL Image: PIL image.
        """
        # 如果随机概率小于 seld.p，则执行 transform
        if random.uniform(0, 1) < self.p:
            # 把 image 转为 array
            img_ = np.array(img).copy()
            # 获得 shape
            h, w, c = img_.shape
            # 信噪比
            signal_pct = self.snr
            # 椒盐噪声的比例 = 1 -信噪比
            noise_pct = (1 - self.snr)
            # 选择的值为 (0, 1, 2)，每个取值的概率分别为 [signal_pct, noise_pct/2., noise_pct/2.]
            # 椒噪声和盐噪声分别占 noise_pct 的一半
            # 1 为盐噪声，2 为 椒噪声
            mask = np.random.choice((0, 1, 2), size=(h, w, 1), p=[signal_pct, noise_pct/2., noise_pct/2.])
            mask = np.repeat(mask, c, axis=2)
            img_[mask == 1] = 255   # 盐噪声
            img_[mask == 2] = 0     # 椒噪声
            # 再转换为 image
            return Image.fromarray(img_.astype('uint8')).convert('RGB')
        # 如果随机概率大于 seld.p，则直接返回原图
        else:
            return img

Then just call it directly AddPepperNoise.

The complete code is as follows:

# -*- coding: utf-8 -*-
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
import numpy as np
import torch
import random
import torchvision.transforms as transforms
from PIL import Image
from matplotlib import pyplot as plt
from torch.utils.data import DataLoader

path_lenet = os.path.abspath(os.path.join(BASE_DIR, "..", "..", "model", "lenet.py"))
path_tools = os.path.abspath(os.path.join(BASE_DIR, "..", "..", "tools", "common_tools.py"))
assert os.path.exists(path_lenet), "{}不存在，请将lenet.py文件放到 {}".format(path_lenet, os.path.dirname(path_lenet))
assert os.path.exists(path_tools), "{}不存在，请将common_tools.py文件放到 {}".format(path_tools, os.path.dirname(path_tools))

import sys
hello_pytorch_DIR = os.path.abspath(os.path.dirname(__file__)+os.path.sep+".."+os.path.sep+"..")
sys.path.append(hello_pytorch_DIR)

from tools.my_dataset import RMBDataset
from tools.common_tools import set_seed, transform_invert

set_seed(1)  # 设置随机种子

# 参数设置
MAX_EPOCH = 10
BATCH_SIZE = 1
LR = 0.01
log_interval = 10
val_interval = 1
rmb_label = {
    
    "1": 0, "100": 1}


class AddPepperNoise(object):
    """增加椒盐噪声
    Args:
        snr （float）: Signal Noise Rate
        p (float): 概率值，依概率执行该操作
    """

    def __init__(self, snr, p=0.9):
        assert isinstance(snr, float) and (isinstance(p, float))    # 2020 07 26 or --> and
        self.snr = snr
        self.p = p

    def __call__(self, img):
        """
        Args:
            img (PIL Image): PIL Image
        Returns:
            PIL Image: PIL image.
        """
        if random.uniform(0, 1) < self.p:
            img_ = np.array(img).copy()
            h, w, c = img_.shape
            # 信号的百分比
            signal_pct = self.snr
            # 噪声的百分比
            noise_pct = (1 - self.snr)
            # 通过0，1，2表示具体的选择
            mask = np.random.choice((0, 1, 2), size=(h, w, 1), p=[signal_pct, noise_pct/2., noise_pct/2.])
            mask = np.repeat(mask, c, axis=2)
            img_[mask == 1] = 255   # 盐噪声 白色的
            img_[mask == 2] = 0     # 椒噪声 黑色的
            return Image.fromarray(img_.astype('uint8')).convert('RGB')
        else:
            return img


# ============================ step 1/5 数据 ============================
split_dir = os.path.abspath(os.path.join(BASE_DIR, "..", "..", "data", "rmb_split"))
if not os.path.exists(split_dir):
    raise Exception(r"数据 {} 不存在, 回到lesson-06\1_split_dataset.py生成数据".format(split_dir))
train_dir = os.path.join(split_dir, "train")
valid_dir = os.path.join(split_dir, "valid")

norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]


train_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    AddPepperNoise(0.9, p=0.5),
    transforms.ToTensor(),
    transforms.Normalize(norm_mean, norm_std),
])

valid_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(norm_mean, norm_std)
])

# 构建MyDataset实例
train_data = RMBDataset(data_dir=train_dir, transform=train_transform)
valid_data = RMBDataset(data_dir=valid_dir, transform=valid_transform)

# 构建DataLoder
train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
valid_loader = DataLoader(dataset=valid_data, batch_size=BATCH_SIZE)


# ============================ step 5/5 训练 ============================
for epoch in range(MAX_EPOCH):
    for i, data in enumerate(train_loader):

        inputs, labels = data   # B C H W

        img_tensor = inputs[0, ...]     # C H W
        img = transform_invert(img_tensor, train_transform)
        plt.imshow(img)
        plt.show()
        plt.pause(0.5)
        plt.close()

Finally, summarize the transforms method of data enhancement:

1. Cutting

transforms.CenterCrop
transforms.RandomCrop
transforms.RandomResizedCrop
transforms.FiveCrop
transforms.TenCrop

2. Flip and rotate

transforms.RandomHorizontalFlip
transforms.RandomVerticalFlip
transforms.RandomRotation

3. Image transformation

transforms.Pad
transforms.ColorJitter
transforms.Grayscale
transforms.RandomGrayscale
transforms.RandomAffine
transforms.LinearTransformation
transforms.RandomErasing
transforms.Lambda
transforms.Resize
transforms.Totensor
transforms.Normalize

Fourth, the operation of transforms

transforms.RandomChoice
transforms.RandomApply
transforms.RandomOrder

Emphasize the principle of data enhancement: make the training set and the test set closer. Whether it is from the position, or the gray scale, or transform filling, etc., it must be done in this direction.