CV task2: data reading and data enhancement

1. Data enhancement

1.1 Data enhancement in CV

  • 1. The picture data is easy to expand, as long as it is rotated, zoomed, cut, filtered, and color space transformed, different views of the same subject can be obtained.
  • 2. Data expansion improves the accuracy of the model. Due to the particularity of the input data, the image recognition subject is three-dimensional, and different directions in the real world will cause different views. In order for the learner to learn the subject from multiple angles and improve generalization ability, The use of augmented data can simulate the multi-angle and multi-environment situations in reality.
  • 3. Deep learning involves many parameters and requires a large number of training samples. Data set enhancement can solve the problem of insufficient number of samples.

In summary, data enhancement is to improve the generalization performance of the model!
Insert picture description here
Without data enhancement, the features learned by the learner may be very one-sided. As shown in the figure below, from this kind of data, what is learned is only to judge the vehicle type from the front of the car, not from the characteristics of the car itself.
Insert picture description here

1.2 Common methods of data enhancement

Data amplification methods can be classified from color space, scale space to sample space. For image classification, data enhancement generally does not change the label, but only changes the image of the same object from different perspectives; for object detection, data amplification will change the coordinate position of the object; for image segmentation, data amplification will change the pixel label.

Common data augmentation method: The basic method transforms the image color, size, shape, space, and pixels. Combining these methods freely can get richer data amplification methods.
Insert picture description here

1.3 Common data enhancement libraries

1.3.1 torchvision

Library introduction:
torchvision is independent of pytorch. It is also a common data conversion enhancement library. Pay attention to the version when installing, otherwise it is easy to version mismatch. The torchvision package consists of popular datasets (torchvision.datasets), model architectures (torchvision.models), and common image transformations for computer vision (torchvision.transforms). Github address

The data amplification library officially provided by pytorch provides basic data amplification methods that can be seamlessly integrated with torch; however, there are fewer types of data amplification methods, and the speed is medium;

Torchvision common data amplification method

transforms.CenterCrop crops the center
of the image transforms.ColorJitter transforms the contrast, saturation and zero degree of the image color transforms.FiveCrop
crops the four corners and center
of the image to obtain five-point image transforms.Grayscale performs grayscale transformations on the image
. Pad uses fixed values ​​for pixel padding
transforms.RandomAffine random affine transformation
transforms.RandomCrop random area cropping
transforms.RandomHorizontalFlip random horizontal flip
transforms.RandomRotation random rotation
transforms.RandomVerticalFlip random vertical flip

1.3.2 imgaug

库介绍:
This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much larger set of slightly altered images.GIthub地址

This python library can help you enhance your data in your machine learning projects. It can convert your input pictures into many new, larger data volume and slightly changed pictures, which is a boon for a relatively small data set ! !

1.3.3 albumentations

Library introduction:
Albumentations is one of the commonly used libraries for image data enhancement during network training in deep learning, with very rich functions. The library has the following characteristics:

  • Realize fast image data enhancement based on the highly optimized OpenCV library.
  • For different image tasks, such as segmentation, detection, etc., super simple API interface.
  • Easy to customize.
  • Easy to add to other frameworks, such as PyTorch.

Albumentations official manual

installation method:

pip install albumentations # 或 sudo pip install -U git+https://github.com/albu/albumentations

2. Data reading

2.1 Pillow reads pictures

effect Code
Insert picture description here from PIL import Image
# Import Pillow library

# Read picture
im =Image.open(cat.jpg')]
Insert picture description here from PIL import Image, ImageFilter
im = Image.open('cat.jpg')
# Apply blur filter:
im2 = im.filter(ImageFilter.BLUR)
im2.save('blur.jpg','jpeg')
Insert picture description here from PIL import Image
# Open a jpg image file, pay attention to the current path:
im = Image.open('cat.jpg')
im.thumbnail((w//2, h//2))
im.save('thumbnail .jpg','jpeg')

Of course, the above only demonstrates Pillow's most basic operations, Pillow also has many image operations, which is a necessary library for image processing.
Pillow's official documentation: https://pillow.readthedocs.io/en/stable/

2.2 Opencv reads pictures

OpenCV is a cross-platform computer vision library, originally open sourced by Intel. OpenCV developed very early and has many functions such as computer vision, digital image processing and machine vision. OpenCV is much more powerful than Pillow in function, and the learning cost is much higher.

effect Code
Insert picture description here import cv2
# Import Opencv library
img = cv2.imread('cat.jpg')
# Opencv default color channel order is BRG, convert
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
Insert picture description here import cv2
# Import Opencv library
img = cv2.imread('cat.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Convert to grayscale
Insert picture description here import cv2
# Import Opencv library
img = cv2.imread('cat.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Convert to grayscale
# Canny edge detection
edges = cv2.Canny(img, 30, 70)
cv2.imwrite('canny.jpg', edges)

OpenCV contains many image processing functions, and OpenCV contains all the operations you can think of as long as they are related to images. In addition, OpenCV also has many built-in image feature processing algorithms, such as key point detection, edge detection, and line detection.
OpenCV official website: https://opencv.org/
OpenCV Github: https://github.com/opencv/opencv
OpenCV extended algorithm library: https://github.com/opencv/opencv_contrib

3. Use pytorch to read data

In Pytorch, data is encapsulated by Dataset and read in parallel by DataLoder. So we only need to reload the data reading logic to complete the data reading.

import os, sys, glob, shutil, json
import cv2

from PIL import Image
import numpy as np

import torch
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms

class SVHNDataset(Dataset):
    def __init__(self, img_path, img_label, transform=None):
        self.img_path = img_path
        self.img_label = img_label 
        if transform is not None:
            self.transform = transform
        else:
            self.transform = None

    def __getitem__(self, index):
        img = Image.open(self.img_path[index]).convert('RGB')

        if self.transform is not None:
            img = self.transform(img)
        
        # 原始SVHN中类别10为数字0
        lbl = np.array(self.img_label[index], dtype=np.int)
        lbl = list(lbl)  + (5 - len(lbl)) * [10]
        
        return img, torch.from_numpy(np.array(lbl[:5]))

    def __len__(self):
        return len(self.img_path)

train_path = glob.glob('../input/train/*.png')
train_path.sort()
train_json = json.load(open('../input/train.json'))
train_label = [train_json[x]['label'] for x in train_json]

data = SVHNDataset(train_path, train_label,
          transforms.Compose([
              # 缩放到固定尺寸
              transforms.Resize((64, 128)),

              # 随机颜色变换
              transforms.ColorJitter(0.2, 0.2, 0.2),

              # 加入随机旋转
              transforms.RandomRotation(5),

              # 将图片转换为pytorch 的tesntor
              # transforms.ToTensor(),

              # 对图像像素进行归一化
              # transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])
            ]))

Through the above code, the image data and corresponding tags of the contest questions can be read, and the data can be amplified during the reading process. The effect is as follows:

Next, we will build DataLoder on the basis of the defined Dataset. Here we need to note that Dataset and DataLode are two different concepts and are used to achieve different functions.

  • Dataset: Encapsulates the data set, provides an index method to read data samples
  • DataLoder: Encapsulate the Dataset and provide iterative reading of batch reading

After joining DataLoder, the data reading code is changed as follows:

import os, sys, glob, shutil, json
import cv2

from PIL import Image
import numpy as np

import torch
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms

class SVHNDataset(Dataset):
    def __init__(self, img_path, img_label, transform=None):
        self.img_path = img_path
        self.img_label = img_label 
        if transform is not None:
            self.transform = transform
        else:
            self.transform = None

    def __getitem__(self, index):
        img = Image.open(self.img_path[index]).convert('RGB')

        if self.transform is not None:
            img = self.transform(img)
        
        # 原始SVHN中类别10为数字0
        lbl = np.array(self.img_label[index], dtype=np.int)
        lbl = list(lbl)  + (5 - len(lbl)) * [10]
        
        return img, torch.from_numpy(np.array(lbl[:5]))

    def __len__(self):
        return len(self.img_path)

train_path = glob.glob('../input/train/*.png')
train_path.sort()
train_json = json.load(open('../input/train.json'))
train_label = [train_json[x]['label'] for x in train_json]

train_loader = torch.utils.data.DataLoader(
        SVHNDataset(train_path, train_label,
                   transforms.Compose([
                       transforms.Resize((64, 128)),
                       transforms.ColorJitter(0.3, 0.3, 0.2),
                       transforms.RandomRotation(5),
                       transforms.ToTensor(),
                       transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
            ])), 
    batch_size=10, # 每批样本个数
    shuffle=False, # 是否打乱顺序
    num_workers=10, # 读取的线程个数
)

for data in train_loader:
    break

After adding DataLoder, the data is obtained in batches, and each batch is called Dataset to read a single sample for splicing. At this time, the format of data is:
torch.Size([10, 3, 64, 128]), torch.Size([10, 6]) The
former is an image file, and the order is batchsize * chanel * height * width; the latter is Character label.

reference

Guess you like

Origin blog.csdn.net/hu_hao/article/details/106305494