The goal of training a machine learning or deep learning model is to be a "universal" model. This requires the model not to overfit the training dataset, or in other words, our model has a good understanding of unseen data. Data augmentation is also one of the many ways to avoid overfitting.
The process of expanding the amount of data used to train a model is called data augmentation. By training a model with multiple data types, we can obtain a more "generalizable" model. What does "multiple data types" mean? This article only discusses "image" data enhancement technology, and only introduces various image data enhancement strategies in detail. We will also use PyTorch to get hands-on and implement image data or data augmentation techniques mainly used in computer vision.
Because the introduction is data enhancement technology. So just use one picture, let's take a look at the code of the video
import PIL.Image as Image
import torch
from torchvision import transforms
import matplotlib.pyplot as plt
import numpy as np
import warnings
def imshow\(img\_path, transform\):
"""
Function to show data augmentation
Param img\_path: path of the image
Param transform: data augmentation technique to apply
"""
img = Image.open\(img\_path\)
fig, ax = plt.subplots\(1, 2, figsize=\(15, 4\)\)
ax\[0\].set\_title\(f'Original image \{img.size\}'\)
ax\[0\].imshow\(img\)
img = transform\(img\)
ax\[1\].set\_title\(f'Transformed image \{img.size\}'\)
ax\[1\].imshow\(img\)
Resize/Rescale
This function is used to adjust the height and width of the image to the specific size we want. The code below demonstrates that we want to resize an image from its original size to 224 x 224.
path = './kitten.jpeg'
transform = transforms.Resize\(\(224, 224\)\)
imshow\(path, transform\)
Cropping
This technique applies a portion of the image to be selected to a new image. For example, use CenterCrop to return a center cropped image.
transform = transforms.CenterCrop\(\(224, 224\)\)
imshow\(path, transform\)
RandomResizedCrop
This approach combines cropping and resizing at the same time.
transform = transforms.RandomResizedCrop\(\(100, 300\)\)
imshow\(path, transform\)
Flipping
Flip the image horizontally or vertically, the code below will try to apply a horizontal flip to our image.
transform = transforms.RandomHorizontalFlip\(\)
imshow\(path, transform\)
Padding
Padding includes padding by the specified amount on all edges of the image. We pad each side by 50 pixels.
transform = transforms.Pad\(\(50,50,50,50\)\)
imshow\(path, transform\)
Rotation
Randomly applies a rotation angle to the image. Let's make this angle 15 degrees.
transform = transforms.RandomRotation\(15\)
imshow\(path, transform\)
Random Affine
This technique is a transformation that keeps the center constant. This technique has some parameters:
-
degrees: rotation angle
-
translate: horizontal and vertical translation
-
scale: scaling parameter
-
share: image cropping parameters
-
fillcolor: the color to fill the outside of the image
-
transform = transforms.RandomAffine\(1, translate=\(0.5, 0.5\), scale=\(1, 1\), shear=\(1,1\), fillcolor=\(256,256,256\)\)
imshow\(path, transform\)
Gaussian Blur
The image will be blurred using Gaussian Blur.
transform = transforms.GaussianBlur\(7, 3\)
imshow\(path, transform\)
Grayscale
Convert a color image to grayscale.
transform = transforms.Grayscale\(num\_output\_channels=3\)
imshow\(path, transform\)
Color enhancement, also known as color dithering, is the process of modifying the color properties of an image by changing the image's pixel values. The following methods are all color-related operations.
Brightness
Change the Brightness of an Image The resulting image is darkened or lightened when compared to the original image.
transform = transforms.ColorJitter\(brightness=2\)
imshow\(path, transform\)
Contrast
The degree of distinction between the darkest and brightest parts of an image is known as contrast. The contrast of the image can also be adjusted as an enhancement.
transform = transforms.ColorJitter\(contrast=2\)
imshow\(path, transform\)
Saturation
The separation of colors in a picture is defined as saturation.
transform = transforms.ColorJitter\(saturation=20\)
imshow\(path, transform\)
Hue
Hue is defined as the shade of color in a picture.
transform = transforms.ColorJitter\(hue=2\)
imshow\(path, transform\)
Summarize
Variations in the image itself will help the model generalize to unseen data so that it does not overfit to the data. The above are all our common data enhancement techniques. Torchvision also contains many methods, which can be found in his documentation: https://pytorch.org/vision/stable/transforms.html