Keras深度学习——使用预训练VGG16模型实现性别分类

持续创作，加速成长！这是我参与「掘金日新计划 · 6 月更文挑战」的第8天，点击查看活动详情

0. 前言

在使用卷积神经网络进行性别分类的应用中，我们看到从零开始训练卷积神经网络 (Convolutional Neural Network, CNN) 模型时，可能会遇到以下一些问题：

训练数据集中图像数量不足，使得模型难以学习
当图像尺寸很大时，卷积可能无法学习图像中的所有特征

第一个问题可以通过对增加数据集中的数据量来解决。第二个问题可以通过在更深的网络架构上训练更多的 epoch 来解决。管我们能够通过执行所有这些操作来解决上述问题，但通常情况下，我们可能无法获取更多的数据量。在这种情况下，使用预先训练完成的模型进行迁移学习将能够快速解决上述问题。

1. 迁移学习

1.1 ImageNet 数据集介绍

在介绍迁移学习之前，我们首先了解下 Keras 中集成的预训练模型所使用的数据集 ImageNet。ImageNet 是一项权威性的图像识别竞赛，要求参与者预测图像所属类别，数据集中有数百万个图像，其中包含尺寸不同的多种类别的图像。有大量的研究团队参与了此竞赛，他们提出了不同的神经网络模型预测图像类别。鉴于有数百万个图像，因此数据集中的数据量不成问题，同时研究团队为了获取优异的模型性能构建了庞大的神经网络网络体系，因此也解决了上述第二个问题。

1.2 迁移学习

简单而言，迁移学习是指将一个预训练的模型重新用于另一个任务中。我们可以重用建立在不同数据集上的卷积神经网络，卷积层已经学习了图像中的各种特征(由于预训练模型使用了大量的图片，因此这些特征对于所有图像而言都具有极大程度上的通用性)，然后将它们传递给全连接层，以便我们可以在新的数据集中预测图像的类。Keras 中提供了多个由不同研究人员提出的预训练模型，本节我们将使用 VGG16。

1.3 利用预训练 VGG16 模型进行性别分类

在本节中，我们将学习如何利用 VGG16 预训练网络进行性别分类。我们首先查看 VGG16 模型的体系结构信息，以便对模型架构有所了解：

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 256, 256, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 256, 256, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 256, 256, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 128, 128, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 128, 128, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 128, 128, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 64, 64, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 64, 64, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 64, 64, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 64, 64, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 32, 32, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 32, 32, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 32, 32, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 32, 32, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 16, 16, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 8, 8, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
__________________________________________________________________
复制代码

该模型的架构与我们在《使用卷积神经网络实现性别分类》中训练的 CNN 模型类似，主要区别在于该模型更深，使用了更多的卷积、池化层，VGG16 网络的权重是通过对数百万个图像进行训练而获得的。

我们在使用预训练模型对图像中的人物进行性别分类时，确保冻结 VGG16 权重的更新。尺寸为 256x256x3 的图像通过 VGG16 网络后特征图形状为 8 x 8 x 512。我们将保持原始网络中的权重不变，得到 8 x 8 x 512 输出，之后将其通过另一个卷积池化操作，然后通过展平层后，连接到全连接层，然后使用 Sigmoid 激活函数确定图像中人物是男性还是女性。

本质上，通过使用 VGG16 模型的卷积和池化层，我们使用了在数以百万计的数据集上训练的卷积核。最终，我们为要预测的图片对象微调 (fine-tuning) 通过这些卷积和池化层得到的输出。

2. 微调模型

2.1 模型实现

本节中，我们使用 Keras 实现以上分析的迁移学习策略后。

首先，导入所需库以及预训练的 VGG16 模型。我们不使用 VGG16 模型中的最后一层，即 include_top=False，这是为了之后我们针对要解决的问题微调 VGG16 模型。另外，我们的指定输入图像形状为 256 X 256 X 3：

from keras.applications import VGG16
from keras.applications.vgg16 import preprocess_input
from glob import glob
from skimage import io
import cv2
import numpy as np
from sklearn.model_selection import train_test_split

vgg16_model = VGG16(include_top=False, weights='imagenet', input_shape=(256, 256, 3))
复制代码

预处理图像数据集。此预处理步骤用于预处理图像数据，以确保其可以作为预训练模型输入。例如，我们对其中一个名为 img 的图像执行预处理，则使用 preprocess_input 方法根据 VGG16 中的预处理要求对图像进行预处理：

from keras.applications.vgg16 import preprocess_input
img = preprocess_input(img.reshape(1,256,256,3))
复制代码

创建输入和输出数据集，首先加载图片，这一过程与卷积神经网络进行性别分类中加载数据的过程相同，然后增加了使用 VGG16 模型提取特征的过程。

我们通过 VGG16 模型提取每个图像特征，以便我们获取 VGG16 的输出作为后续微调模型的输入。并且，图片在输入 VGG16 之前需要使用 preprocess_input 方法执行预处理过程，如下所示：

x = []
y = []
for i in glob('man_woman/a_resized/*.jpg')[:800]:
    try:
        image = io.imread(i)
        x.append(image)
        y.append(0)
    except:
        continue

for i in glob('man_woman/b_resized/*.jpg')[:800]:
    try:
        image = io.imread(i)
        x.append(image)
        y.append(1)
    except:
        continue


x_vgg16 = []
for i in range(len(x)):
    img = x[i]
    img = preprocess_input(img.reshape((1, 256, 256, 3)))

    # 将预处理后的输入传递给 VGG16 模型以提取特征
    img_feature = vgg16_model.predict(img)
    x_vgg16.append(img_feature)
复制代码

在以上代码中，除了通过 VGG16 模型处理输入图像外，我们还将提取到的特征 img_feature 存储在列表 x_vgg16 中，作为后续微调模型的输入。

然后，将输入和输出转换为 NumPy 数组，并创建训练和测试数据集：

x_vgg16 = np.array(x_vgg16)
x_vgg16 = x_vgg16.reshape(x_vgg16.shape[0], x_vgg16.shape[2], x_vgg16.shape[3], x_vgg16.shape[4])
y = np.array(y)
x_train, x_test, y_train, y_test = train_test_split(x_vgg16, y, test_size=0.2)
复制代码

建立并编译模型：

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense
model_fine_tuning = Sequential()
model_fine_tuning.add(Conv2D(512, 
                        kernel_size=(3, 3),
                        activation='relu',
                        input_shape=(x_train.shape[1], x_train.shape[2], x_train.shape[3])))
model_fine_tuning.add(MaxPooling2D(pool_size=(2, 2)))
model_fine_tuning.add(Flatten())
model_fine_tuning.add(Dense(512, activation='relu'))
model_fine_tuning.add(Dropout(0.5))
model_fine_tuning.add(Dense(1, activation='sigmoid'))
model_fine_tuning.summary()
复制代码

微调模型的简要结构信息输入如下：

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 6, 6, 512)         2359808   
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 3, 3, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 4608)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               589952    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 129       
=================================================================
Total params: 2,949,889
Trainable params: 2,949,889
Non-trainable params: 0
_________________________________________________________________
复制代码

编译并拟合模型：

model_fine_tuning.compile(loss='binary_crossentropy',optimizer='adam',metrics=['acc'])

history = model_fine_tuning.fit(x_train, y_train,
                                    batch_size=32,
                                    epochs=20,
                                    verbose=1,
                                    validation_data = (x_test, y_test))
复制代码

在训练模型后，我们可以看到模型在测试数据集最终可以很快(大约需要 3-5 个 epoch )达到约 95％ 的准确率，而我们在卷积神经网络进行性别分类中训练的性别分类 CNN 模型在任何情况下都无法在 5 个 epoch 内达到 95％ 的分类准确率。：

性能检测

2.2 错误分类的图片示例

模型分类错误的一些图像样本如下：

x = np.array(x)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

x_test_vgg16 = []
for i in range(len(x_test)):
    img = x_test[i]
    img = preprocess_input(img.reshape((1, 256, 256, 3)))
    img_feature = model.predict(img)
    x_test_vgg16.append(img_feature)

x_test_vgg16 = np.array(x_test_vgg16)
x_test_vgg16 = x_test_vgg16.reshape(x_test_vgg16.shape[0], x_test_vgg16.shape[2], x_test_vgg16.shape[3], x_test_vgg16.shape[4])
y_pred = model_fine_tuning.predict(x_test_vgg16)
wrong = np.argsort(np.abs(y_pred.flatten()-y_test))
print(wrong)

y_test_char = np.where(y_test==0,'M','F')
y_pred_char = np.where(y_pred>0.5,'F','M')

plt.subplot(221)
plt.imshow(x_test[wrong[-1]])
plt.title('Actual: '+str(y_test_char[wrong[-1]])+', '+'Predicted: '+str((y_pred_char[wrong[-1]][0])))
plt.subplot(222)
plt.imshow(x_test[wrong[-2]])
plt.title('Actual: '+str(y_test_char[wrong[-2]])+', '+'Predicted: '+str((y_pred_char[wrong[-2]][0])))
plt.subplot(223)
plt.imshow(x_test[wrong[-3]])
plt.title('Actual: '+str(y_test_char[wrong[-3]])+', '+'Predicted: '+str((y_pred_char[wrong[-3]][0])))
plt.subplot(224)
plt.imshow(x_test[wrong[-4]])
plt.title('Actual: '+str(y_test_char[wrong[-4]])+', '+'Predicted: '+str((y_pred_char[wrong[-4]][0])))
plt.show()
复制代码

错误分类的图片

可以看到，当输入图像是面部的一部分，或者图像中的面部占整个图像的比例较小时，则模型可能会错误分类。