The principle and method of Keras ImageDataGenerator for data expansion/enhancement

Summary

In this blog, you will learn how to perform data augmentation/enhancement using the ImageDataGenerator class of Keras. In addition, we will introduce what data enhancement is, the types of data enhancement, why data enhancement is used, and what it can/cannot do.

There are three types of data augmentation. By default, Keras's ImageDataGenerator class performs in-place/instant data augmentation.

The two solutions for detecting overfitting are (1) reduce model capacity or (2) perform regularization.

Data augmentation is a form of regularization that allows our network to better generalize it to our test/validation set.

Not applying data augmentation in training can lead to overfitting. Application of data enhancement allows smooth training, avoids overfitting and has higher accuracy/lower loss.

It is strongly recommended to use data augmentation in all training.

Insert picture description here

1. What is Keras ImageDataGenerator

Keras' ImageDataGenerator is very common in training convolutional neural networks. It is to train the model after performing a series of random transformations on the data set to be trained to improve the versatility of the model and make the model have better generalization capabilities.

Insert picture description hereThe model trained on the modified augmented data is more likely to be summarized as example data points not included in the training set.

The enhanced image can also be obtained through some simple geometric transformations, such as translation, rotation, zoom in/out, cut, horizontal/vertical flip, etc.;

Applying a small amount of transformation to the input image will slightly change its appearance, but will not change the class label, making data enhancement a very natural and convenient method for computer vision tasks.

2. The working principle of Keras ImageDataGenerator

ImageDataGenerator accepts the original data, transforms it randomly, and returns only the new data after the transformation.

  1. Accept a batch of images for training;
  2. Perform this batch processing and apply a series of random transformations (including random rotation, resizing, cropping, etc.) to each image in the batch;
  3. Replace the original batch with a new, randomly converted batch;
  4. Train the CNN on this randomly converted batch (that is, the original data itself is not used for training).

3. Three types of Keras ImageDataGenerator

  • (1) Generate data sets and data extensions through data enhancement (less common)

There is a problem with this method-the generalization ability of the model has not been fully improved.

Imagine generating 100 images from one image and then training; since all these data are based on ultra-small data sets.
We cannot expect to train a NN on a small amount of data and then expect to generalize it to data that has never been trained and has never been seen before.

  • (2) On-site/real-time data enhancement (most common)

This kind of enhancement method is the most commonly used, there are two places to pay attention to:

  1. Does the ImageDataGenerator return both the original data and the transformed data-only randomly transformed data?
  2. Because this expansion is done during training, it is called "in-place" and "instant" data expansion (that is, these examples will not be generated before training);

Since the training is performed with data that has been transformed by random translation, rotation, shearing, etc., the model has a relatively good generalization ability, and it performs well on the test set, but will be worse on the training set , Because we did not train with the original training data, it has a certain deviation.

  • (3) Combine data set generation and in-situ expansion

When training data is scarce and real scene data is more difficult to collect, you can use Type 2 data augmentation (ie in-situ/instant data augmentation) to apply to the data collected through simulation.
Similar to behavioral cloning, it is used in autonomous driving applications.

4. Project structure

Insert picture description here

5. Implement generate_images.py, train.py and train CNN

  • generate_images.py generates a data set with enhanced data
  • train.py and perform model training after different data enhancements

; (1) General an image generator 100 of training data and training CNN 50% accuracy rate
(2) using Kaggle dogs and cats data training CNN case sets a subset, and no data expansion; 64% accuracy rate
(3) using Kaggle dogs and cats data set a subset of, and conducting training CNN when the data expansion; 69% accuracy rate

Use (1) the generated training accuracy/loss map,
Insert picture description hereuse (2) the generated training accuracy/loss map,
Insert picture description hereuse (3) the generated training accuracy/loss map [ convergence is better, there will be no accuracy height, loss It also follows the rising situation, which can perfectly avoid overfitting, and has better generalization ability]
Insert picture description here

get conclusion:

  1. Data enhancement can reduce overfitting and improve the ability of the model to generalize;
  2. Data augmentation is a form of regularization that guarantees how the validation and training losses are reduced with almost no divergence. Similarly, the classification accuracy of training and verification splits is also improved;
  3. By using data augmentation, overfitting can be overcome!
# 测试三种数据增强类型后训练的模型情况

# 第一种试验:通用1张图像生成100张训练数据,进行训练  50%的准确率
# python train.py --dataset generated_dataset --plot plot_generated_dataset.png
# 探讨数据扩充如何通过两次实验来减少过度拟合并提高模型进行泛化的能力,获取到了 64%的准确率,检测到过度拟合的俩种解决方案是(1)减少模型容量或(2)执行正则化。
# 第二种试验:不使用数据扩充
# python train.py --dataset dogs_vs_cats_small --plot plot_dogs_vs_cats_no_aug.png
# 第三种试验:运用数据扩充 研究数据增强如何充当正则化形式  69%的准确率 【注意验证和训练损失如何在几乎没有分歧的情况下下降。同样,训练和验证拆分的分类准确性也一起提高。】
# 通过使用数据增强,我们可以克服过度拟合!
# 强烈建议在任何情况下训练神经网络时都使用 数据增强;
# python train.py --dataset dogs_vs_cats_small --augment 1 --plot plot_dogs_vs_cats_with_aug.png

# 导入必要的包
# 设置matplot为Agg以保存模型训练的plot图到磁盘
import matplotlib
matplotlib.use("Agg")

from pyimagesearch.resnet import ResNet
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# 导入ImageDataGenerator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.utils import to_categorical
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
import os

# 构建命令行参数
#  --dataset 数据集的路径
#  --augment 是否使用数据增强方式2(1.通过数据增强生成数据集和数据扩展(较少见) 2.就地/即时数据增强(最常见) 3.将数据集生成和就地扩充相结合 )
#  --plot 保存 loss/accuracy 图的路径
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
                help="path to input dataset")
ap.add_argument("-a", "--augment", type=int, default=-1,
                help="whether or not 'on the fly' data augmentation should be used")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
                help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

# 初始化初始学习率,批处理大小batchsize,训练的期数epochs
INIT_LR = 1e-1
BS = 8
EPOCHS = 50

# 获取数据集,并把数据,标签按顺序存储在list中
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []

# 循环遍历图片路径
for imagePath in imagePaths:

    # 从文件名中提取分类标签名称,加载图片,忽略宽高比缩放为 64*64
    label = imagePath.split(os.path.sep)[-2]
    image = cv2.imread(imagePath)
    image = cv2.resize(image, (64, 64))

    # 更新数据、标签list
    data.append(image)
    labels.append(label)

# 转换数据、标签list为Numpy array,并将数据的像素强度转换为[0,255]
data = np.array(data, dtype="float") / 255.0

# 编码类标签,由字符串转为integer转为 一键热编码数组(echc:[1,0]代表cats,[0,1]代表dogs)
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = to_categorical(labels, 2)

#分组数据为75%的训练数据,25%的测试数据
(trainX, testX, trainY, testY) = train_test_split(data, labels,
                                                  test_size=0.25, random_state=42)

# 初始化数据扩充对象(初始化一个空对象)
aug = ImageDataGenerator()

# 检查是否需要进行数据扩充 --augment参数的值
if args["augment"] > 0:
    print("[INFO] performing 'on the fly' data augmentation")
    # 随机旋转,缩放,移动,剪切和翻转。(random rotations, zooms, shifts, shears, and flips)
    aug = ImageDataGenerator(
        rotation_range=20,
        zoom_range=0.15,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.15,
        horizontal_flip=True,
        fill_mode="nearest")

# 初始化优化器和模型
# 构建我们的ResNet,使用随机梯度下降优化和学习率衰减的模型。我们使用“ binary_crossentropy” 2类问题的损失。如果您有两个以上的类标签,请确保使用“ categorial_crossentropy”
print("[INFO] compiling model...")
opt = SGD(lr=INIT_LR, momentum=0.9, decay=INIT_LR / EPOCHS)
model = ResNet.build(64, 64, 3, 2, (2, 3, 4),
                     (32, 64, 128, 256), reg=0.0001)
model.compile(loss="binary_crossentropy", optimizer=opt,
              metrics=["accuracy"])

# 训练模型
# 对象分批处理数据扩充(仅当--augment命令行参数已设置,对象才会执行数据扩充)
print("[INFO] training network for {} epochs...".format(EPOCHS))
H = model.fit(
    x=aug.flow(trainX, trainY, batch_size=BS),
    validation_data=(testX, testY),
    steps_per_epoch=len(trainX) // BS,
    epochs=EPOCHS)

# 评估模型
print("[INFO] evaluating network...")
predictions = model.predict(x=testX.astype("float32"), batch_size=BS)
print(classification_report(testY.argmax(axis=1),
                            predictions.argmax(axis=1), target_names=le.classes_))

# 绘制训练损失/精确度图
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["accuracy"], label="train_acc")
plt.plot(N, H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

reference:

Guess you like

Origin blog.csdn.net/qq_40985985/article/details/106966561