Crop, Flip and Rotate Data Augmentation

Articles and codes have been archived in [Github warehouse: https://github.com/timerring/dive-into-AI ] or the public account [AIShareLab] can also be obtained by replying to the pytorch tutorial .

Data AugmentationData Augmentation

Data augmentation is also called data augmentation and data augmentation. It transforms the training set to make the training set richer, thereby making the model more generalizable.

Skill:

debug console: The environment of the command input window is exactly the same as the environment of the current code debugging, and variables can be changed or viewed.

For example, here is a look at the shape of the input variable.

Since the image is a tensor after the transform operation, the pixel value is between 0 and 1, and the standard deviation and variance are not normal images. So the method is defined transform_invert(). The function is to denormalize tensor and convert tensor to image for easy visualization.

The main modification is transforms.Composethe content in the code block, which transforms.Resize((224, 224))is to scale the image to (224, 224) size, and then perform other transform operations.

Cropping

transforms.CenterCrop

torchvision.transforms.CenterCrop(size)

Function: Crop the picture from the center of the image

  • size : the size of the image to be cropped

CenterCrop: Crop at the center. If the cropped size is smaller than the original size, the cropped part will be displayed. Otherwise, the extra part will be filled with 0 pixels (that is, black).

transforms.RandomCrop

torchvision.transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')

Function: Randomly crop a size-sized image from the image. If there is padding, perform padding first, and then randomly crop a size-sized image.

  • size: crop size

  • padding: set padding size

    • When it is a, a pixel is filled up, down, left, and right
    • When it is (a, b), fill a pixels left and right, fill b pixels up and down
    • When it is (a, b, c, d), fill a, b, c, d in the upper left and lower right respectively
  • pad_if_need: When the picture is smaller than the set size, whether to fill it

  • padding_mode:

    • constant: the pixel value is set by fill

    • edge: fill with pixel values ​​at the edge of the image

    • reflect: mirror fill, the last pixel is not mirrored. ([1,2,3,4] -> [3,2,1,2,3,4,3,2])

      It can be seen that neither 1 nor 4 has a mirror image.

    • symmetric: mirror fill, the last pixel is also mirrored. ([1,2,3,4] -> [2,1,1,2,3,4,4,4,3])

  • fill: When padding_mode is constant, set the filled pixel value, if not set, the default padding is 0.

transforms.RandomResizedCrop

torchvision.transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(3 / 4, 4 / 3), interpolation=2)

Function: Crop pictures with random size and random aspect ratio. First crop the original image according to the ratio of scale, then crop according to the aspect ratio of ratio, and finally use interpolation to convert the image to size.

  • size: cropped image size
  • scale: Randomly scale the area ratio. By default, a number between (0.08, 1) is randomly selected, and the range can be modified by yourself.
  • ratio: Random aspect ratio, randomly selected by default ( 3 4 \displaystyle\frac{3}{4}43, 4 3 \displaystyle\frac{4}{3} 34) between numbers. Distortion is not obvious in this range, you can modify the range by yourself.
  • interpolation: When the cropped image is smaller than the size, the interpolation method resize must be used. There are three main interpolation methods, as follows:
    • PIL.Image.NEAREST
    • PIL.Image.BILINEAR
    • PIL.Image.BICUBIC

transforms.FiveCrop(TenCrop)

torchvision.transforms.FiveCrop(size)
torchvision.transforms.TenCrop(size, vertical_flip=False)

Function: FiveCropCut out 5 pictures of size from the top, bottom, left, right and center of the image. TencropMirror these 5 images horizontally (default) or vertically to get 10 images.

  • size: cropped image size
  • vertical_flip: Whether to flip vertically

Since these two methods return a tuple, and each element represents a picture, we also need to convert this tuple into a picture tensor. code show as below:

transforms.FiveCrop(112),
transforms.Lambda(lambda crops: torch.stac k([(transforms.ToTensor()(crop)) for crop in crops]))

The Lambda anonymous function is used here: the first crops are the input of the function, and the following crops are the return value of the function. Among them, the stack is spliced ​​on a certain dimension of the tensor. The default is the 0th dimension. [(transforms.ToTensor()(crop)) for crop in crops]) performs a for loop on the crops, and then takes out the crops for The operation of the totensor is converted into a tensor form to obtain a list with a length of 5, and then the stack splices the list with a length of 5 into a tensor.

And transforms.Composecomment out the last two lines:

# transforms.ToTensor(), # toTensor()接收的参数是 Image,由于上面已经进行了 toTensor()
# transforms.Normalize(norm_mean, norm_std), # 由于是 4 维的 Tensor,因此不能执行 Normalize() 方法
  • transforms.Normalize()The method receives a 3-dimensional tensor ( _is_tensor_image()check whether this condition is met in the method, and reports an error if it is not satisfied), and returns transforms.FiveCropa 4-dimensional tensor, so comment this line.

The final tensor shape is [ncrops, c, h, w], and the image visualization code also needs to be modified:

## 展示 FiveCrop 和 TenCrop 的图片
ncrops, c, h, w = img_tensor.shape
columns=2 # 两列
rows= math.ceil(ncrops/2) # 计算多少行
# 把每个 tensor ([c,h,w]) 转换为 image
for i in range(ncrops):
    img = transform_invert(img_tensor[i], train_transform)
    plt.subplot(rows, columns, i+1)
    plt.imshow(img)
plt.show()

The 5 pictures are upper left corner, upper right corner, lower left corner, lower right corner and center.

Flip Flip

transforms.RandomHorizontalFlip(RandomVerticalFlip)

transforms.RandomVerticalFlip(p=1)

Function: Flip the picture horizontally or vertically according to the probability.

  • p: flip probability

transforms.RandomHorizontalFlip(p=0.5), then half of the image will be flipped horizontally.

transforms.RandomVerticalFlip(p=1), then all images will be flipped vertically.

Rotation

transforms.RandomRotation

torchvision.transforms.RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)

Function: Randomly rotate pictures

  • degree: rotation angle

    • When it is a, randomly select the rotation angle between (-a, a)
    • When (a, b), randomly select the rotation angle between (a, b)
  • resample: resampling method, usually the default is fine

  • expand: Whether to expand the rectangular frame to keep the original image information. Computes the enlarged image based on the center rotation point. If the rotation point is not the center, even if expand = True, some information will still be lost. Because expand is mainly designed for center rotation, if the rotation point is changed, the rotation information will be lost.

    If expand=True is set and the batch size is greater than 1, then in a batch, the shape of each picture is different, and an error will be reported Sizes of tensors must match except in dimension 0. So if expand=True, then resize operation is also required.

  • center: The rotation point setting, which is the coordinate, and the default center is rotated. For example, set the upper left corner to: (0, 0)

Guess you like

Origin blog.csdn.net/m0_52316372/article/details/131621911