在Keras中使用VGG进行物体识别

众所周知，卷积神经网络（CNN，Convolutional neural networks）在计算机视觉领域已经取得了巨大成功，特别是在各种物体识别竞赛上率摘桂冠。

一、Keras中的VGG()

Keras 作为当前深度学习框架中的四大天王之一，使用起来是极其简便的，它所提供的各种友好而灵活的API，即使对于新手而言，相比于TensorFlow也非常容易上手。更特别的是，Keras中还预置了多种已经训练好的、非常流行的神经网络模型（可以参考文献【2】以了解更多），使用者可以非常方便地以他山之石来解决自己的问题。本文将以VGG16为例来演示，如何在Keras中执行物体识别（Object Recognization）任务。VGG16是由来自牛津大学的研究团队涉及并实现的一个基于CNN的深度学习网络，它的深度为23（包括16个layers），所有的权重总计超过500M，下图给出了它的一个基本结构。

根据文献【2】可知，你可以使用下面的命令直接导入已经训练好的VGG16网络，注意因为全部的参数总计超过500M，因此当你首次使用下面的命令时，Keras需要从网上先下载这些参数，这可能需要耗用一些时间。

[python] view plain copy

from keras.applications.vgg16 import VGG16
model = VGG16()
print(model.summary())

最后一句会输入VGG16网络的层级结构，读者可以在自己的电脑上查看结果，这里不再赘述。不仅如此，VGG()这个类中还提供了一些参数，这些参数可以令你非常方便地定制个性化的网络结构，这一点在迁移学习（Transfer Learning）中尤其有用，我们摘列部分参数如下，供有兴趣的读者参考：

include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
input_shape (None): The size of images that the model is expected to take if you change the input layer.
pooling (None): The type of pooling to use when you are training a new set of output layers.
classes (1000): The number of classes (e.g. size of output vector) for the model.

二、开发一个简单的图像分类器

首先，准备好一张需要识别的图像，例如下面这张（假设文件名为dog.6621.jpg）的图像，其内容为一只矮腿猎犬（basset）。

Keras中已经提供了非常方便的API来导入图像数据，易见，load_img()函数用于载入图像，其中的参数target_size用于设置目标图像的大小，如此一来无论载入的原图像大小为何，都会被标准化成统一的大小，这样做是为了向神经网络中方便地输入数据所需的。

[python] view plain copy

from keras.preprocessing.image import load_img
# load an image from www.baidu620.com file
image = load_img('dog.6621.jpg', target_size=(224, 224))
from keras.preprocessing.image import img_to_array
# convert the image pixels to a www.leyou1178.cn/ numpy array
image = img_to_array(image)

此外，函数img_to_array会把图像中的像素数据转化成NumPy中的array，这样数据才可以被Keras所使用。

神经网络接收一张或多张图像作为输入，也就是说，输入的array需要有4个维度： samples, rows, columns, and channels。由于我们仅有一个 sample（即一张image），我们需要对这个array进行reshape操作。

[python] view plain copy

# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

接下来，需要对图像进行一定的预处理。根据文献【4】：“The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.”Keras 提供了一个叫做preprocess_input()的函数，用以对网络中的新收入进行预处理。

[python] view plain copy

from keras.applications.vgg16 import preprocess_input
# prepare the image for the www.120xh.cn VGG model
image = preprocess_input(image)

三、预测与评估

在Keras中使用已经训练好的模型进行预测时，只要使用predict()函数即可。VGG模型可以对图像在1000个类别中的分类情况进行预测。你可以想象利用softmax进行手写数字识别（MNIST）的情况，彼时输出是一个10维的向量，其中每个元素表示输入隶属于某个具体数字的概率。

[python] view plain copy

# predict the probability www.leyou2.net across all output classes
yhat = model.predict(image)

当然，这个输出的向量并不直观，我们还是不知道图像的分类结果为何。为此，Keras提供了一个函数decode_predictions()，用以对已经得到的预测向量进行解读。该函数返回一个类别列表，以及类别中每个类别的预测概率，通常你可以选择输出排名最高的3个预测类别。这里我们仅输出最高可能性的预测类别：

[python] view plain copy

from keras.applications.vgg16 import decode_predictions
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the most likely result, e.g. highest probability
label = label[0][0]
# print the classification
print('%s (%.2f%%www.tygj1178.com)' %www.dashuj5.com (label[1], label[2]*100))

执行上述代码，最终的预测结果为：basset (89.00%)，可见这恰好是我们所期望的结果。

参考文献

【1】https://machinelearningmastery.com/use-pre-trained-vgg-model-classify-objects-photographs/

【2】https://keras.io/applications/

【3】Keras系列视频

【4】VGG论文（https://arxiv.org/pdf/1409.1556.pdf）