Data reading and data enhancement
1. Data enhancement
1.1 Data enhancement in CV
- 1. The picture data is easy to expand, as long as it is rotated, zoomed, cut, filtered, and color space transformed, different views of the same subject can be obtained.
- 2. Data expansion improves the accuracy of the model. Due to the particularity of the input data, the image recognition subject is three-dimensional, and different directions in the real world will cause different views. In order for the learner to learn the subject from multiple angles and improve generalization ability, The use of augmented data can simulate the multi-angle and multi-environment situations in reality.
- 3. Deep learning involves many parameters and requires a large number of training samples. Data set enhancement can solve the problem of insufficient number of samples.
In summary, data enhancement is to improve the generalization performance of the model!
Without data enhancement, the features learned by the learner may be very one-sided. As shown in the figure below, from this kind of data, what is learned is only to judge the vehicle type from the front of the car, not from the characteristics of the car itself.
1.2 Common methods of data enhancement
Data amplification methods can be classified from color space, scale space to sample space. For image classification, data enhancement generally does not change the label, but only changes the image of the same object from different perspectives; for object detection, data amplification will change the coordinate position of the object; for image segmentation, data amplification will change the pixel label.
Common data augmentation method: The basic method transforms the image color, size, shape, space, and pixels. Combining these methods freely can get richer data amplification methods.
1.3 Common data enhancement libraries
1.3.1 torchvision
Library introduction:
torchvision is independent of pytorch. It is also a common data conversion enhancement library. Pay attention to the version when installing, otherwise it is easy to version mismatch. The torchvision package consists of popular datasets (torchvision.datasets), model architectures (torchvision.models), and common image transformations for computer vision (torchvision.transforms). Github address
The data amplification library officially provided by pytorch provides basic data amplification methods that can be seamlessly integrated with torch; however, there are fewer types of data amplification methods, and the speed is medium;
Torchvision common data amplification method
transforms.CenterCrop crops the center
of the image transforms.ColorJitter transforms the contrast, saturation and zero degree of the image color transforms.FiveCrop
crops the four corners and center
of the image to obtain five-point image transforms.Grayscale performs grayscale transformations on the image
. Pad uses fixed values for pixel padding
transforms.RandomAffine random affine transformation
transforms.RandomCrop random area cropping
transforms.RandomHorizontalFlip random horizontal flip
transforms.RandomRotation random rotation
transforms.RandomVerticalFlip random vertical flip
1.3.2 imgaug
库介绍:
This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much larger set of slightly altered images.GIthub地址
This python library can help you enhance your data in your machine learning projects. It can convert your input pictures into many new, larger data volume and slightly changed pictures, which is a boon for a relatively small data set ! !
1.3.3 albumentations
Library introduction:
Albumentations is one of the commonly used libraries for image data enhancement during network training in deep learning, with very rich functions. The library has the following characteristics:
- Realize fast image data enhancement based on the highly optimized OpenCV library.
- For different image tasks, such as segmentation, detection, etc., super simple API interface.
- Easy to customize.
- Easy to add to other frameworks, such as PyTorch.
Albumentations official manual
installation method:
pip install albumentations # 或 sudo pip install -U git+https://github.com/albu/albumentations
2. Data reading
2.1 Pillow reads pictures
effect | Code |
---|---|
from PIL import Image # Import Pillow library # Read picture im =Image.open(cat.jpg')] |
|
from PIL import Image, ImageFilter im = Image.open('cat.jpg') # Apply blur filter: im2 = im.filter(ImageFilter.BLUR) im2.save('blur.jpg','jpeg') |
|
from PIL import Image # Open a jpg image file, pay attention to the current path: im = Image.open('cat.jpg') im.thumbnail((w//2, h//2)) im.save('thumbnail .jpg','jpeg') |
Of course, the above only demonstrates Pillow's most basic operations, Pillow also has many image operations, which is a necessary library for image processing.
Pillow's official documentation: https://pillow.readthedocs.io/en/stable/
2.2 Opencv reads pictures
OpenCV is a cross-platform computer vision library, originally open sourced by Intel. OpenCV developed very early and has many functions such as computer vision, digital image processing and machine vision. OpenCV is much more powerful than Pillow in function, and the learning cost is much higher.
effect | Code |
---|---|
import cv2 # Import Opencv library img = cv2.imread('cat.jpg') # Opencv default color channel order is BRG, convert img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) |
|
import cv2 # Import Opencv library img = cv2.imread('cat.jpg') img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to grayscale |
|
import cv2 # Import Opencv library img = cv2.imread('cat.jpg') img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to grayscale # Canny edge detection edges = cv2.Canny(img, 30, 70) cv2.imwrite('canny.jpg', edges) |
OpenCV contains many image processing functions, and OpenCV contains all the operations you can think of as long as they are related to images. In addition, OpenCV also has many built-in image feature processing algorithms, such as key point detection, edge detection, and line detection.
OpenCV official website: https://opencv.org/
OpenCV Github: https://github.com/opencv/opencv
OpenCV extended algorithm library: https://github.com/opencv/opencv_contrib
3. Use pytorch to read data
In Pytorch, data is encapsulated by Dataset and read in parallel by DataLoder. So we only need to reload the data reading logic to complete the data reading.
import os, sys, glob, shutil, json
import cv2
from PIL import Image
import numpy as np
import torch
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms
class SVHNDataset(Dataset):
def __init__(self, img_path, img_label, transform=None):
self.img_path = img_path
self.img_label = img_label
if transform is not None:
self.transform = transform
else:
self.transform = None
def __getitem__(self, index):
img = Image.open(self.img_path[index]).convert('RGB')
if self.transform is not None:
img = self.transform(img)
# 原始SVHN中类别10为数字0
lbl = np.array(self.img_label[index], dtype=np.int)
lbl = list(lbl) + (5 - len(lbl)) * [10]
return img, torch.from_numpy(np.array(lbl[:5]))
def __len__(self):
return len(self.img_path)
train_path = glob.glob('../input/train/*.png')
train_path.sort()
train_json = json.load(open('../input/train.json'))
train_label = [train_json[x]['label'] for x in train_json]
data = SVHNDataset(train_path, train_label,
transforms.Compose([
# 缩放到固定尺寸
transforms.Resize((64, 128)),
# 随机颜色变换
transforms.ColorJitter(0.2, 0.2, 0.2),
# 加入随机旋转
transforms.RandomRotation(5),
# 将图片转换为pytorch 的tesntor
# transforms.ToTensor(),
# 对图像像素进行归一化
# transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])
]))
Through the above code, the image data and corresponding tags of the contest questions can be read, and the data can be amplified during the reading process. The effect is as follows:
Next, we will build DataLoder on the basis of the defined Dataset. Here we need to note that Dataset and DataLode are two different concepts and are used to achieve different functions.
- Dataset: Encapsulates the data set, provides an index method to read data samples
- DataLoder: Encapsulate the Dataset and provide iterative reading of batch reading
After joining DataLoder, the data reading code is changed as follows:
import os, sys, glob, shutil, json
import cv2
from PIL import Image
import numpy as np
import torch
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms
class SVHNDataset(Dataset):
def __init__(self, img_path, img_label, transform=None):
self.img_path = img_path
self.img_label = img_label
if transform is not None:
self.transform = transform
else:
self.transform = None
def __getitem__(self, index):
img = Image.open(self.img_path[index]).convert('RGB')
if self.transform is not None:
img = self.transform(img)
# 原始SVHN中类别10为数字0
lbl = np.array(self.img_label[index], dtype=np.int)
lbl = list(lbl) + (5 - len(lbl)) * [10]
return img, torch.from_numpy(np.array(lbl[:5]))
def __len__(self):
return len(self.img_path)
train_path = glob.glob('../input/train/*.png')
train_path.sort()
train_json = json.load(open('../input/train.json'))
train_label = [train_json[x]['label'] for x in train_json]
train_loader = torch.utils.data.DataLoader(
SVHNDataset(train_path, train_label,
transforms.Compose([
transforms.Resize((64, 128)),
transforms.ColorJitter(0.3, 0.3, 0.2),
transforms.RandomRotation(5),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])),
batch_size=10, # 每批样本个数
shuffle=False, # 是否打乱顺序
num_workers=10, # 读取的线程个数
)
for data in train_loader:
break
After adding DataLoder, the data is obtained in batches, and each batch is called Dataset to read a single sample for splicing. At this time, the format of data is:
torch.Size([10, 3, 64, 128]), torch.Size([10, 6]) The
former is an image file, and the order is batchsize * chanel * height * width; the latter is Character label.
reference
- A very useful data enhancement library-imgaug (with installation process)
- Introduction to torchvision
- Albumentations data enhancement method
- The content of this blog comes from the DataWhale CV project learning