VGG16学习笔记

一. 简述

VGG卷积神经网络是牛津大学在2014年提出来的模型。当这个模型被提出时，由于它的简洁性和实用性，马上成为了当时最流行的卷积神经网络模型。它在图像分类和目标检测任务中都表现出非常好的结果。在2014年的ILSVRC比赛中，VGG 在Top-5中取得了92.3%的正确率。

VGG结构图

VGG-16

VGG模型有一些变种，其中最受欢迎的当然是 VGG-16，这是一个拥有16层的模型。你可以看到它需要维度是 224*224*3 的输入数据。

二. Keras实现VGG16

from keras import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten, Dropout
from keras.layers import Input
from keras.optimizers import SGD

model = Sequential()

# BLOCK 1
model.add(Conv2D(filters = 64, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block1_conv1', input_shape = (224, 224, 3)))   
model.add(Conv2D(filters = 64, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block1_conv2'))
model.add(MaxPooling2D(pool_size = (2, 2), strides = (2, 2), name = 'block1_pool'))

# BLOCK2
model.add(Conv2D(filters = 128, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block2_conv1'))   
model.add(Conv2D(filters = 128, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block2_conv2'))
model.add(MaxPooling2D(pool_size = (2, 2), strides = (2, 2), name = 'block2_pool'))

# BLOCK3
model.add(Conv2D(filters = 256, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block3_conv1'))   
model.add(Conv2D(filters = 256, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block3_conv2'))
model.add(Conv2D(filters = 256, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block3_conv3'))
model.add(MaxPooling2D(pool_size = (2, 2), strides = (2, 2), name = 'block3_pool'))

# BLOCK4
model.add(Conv2D(filters = 512, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block4_conv1'))   
model.add(Conv2D(filters = 512, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block4_conv2'))
model.add(Conv2D(filters = 512, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block4_conv3'))
model.add(MaxPooling2D(pool_size = (2, 2), strides = (2, 2), name = 'block4_pool'))

# BLOCK5
model.add(Conv2D(filters = 512, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block5_conv1'))   
model.add(Conv2D(filters = 512, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block5_conv2'))
model.add(Conv2D(filters = 512, kernel_size = (3, 3), activation = 'relu', padding = 'same', name = 'block5_conv3'))
model.add(MaxPooling2D(pool_size = (2, 2), strides = (2, 2), name = 'block5_pool'))

model.add(Flatten())
model.add(Dense(4096, activation = 'relu', name = 'fc1'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation = 'relu', name = 'fc2'))
model.add(Dropout(0.5))
model.add(Dense(1000, activation = 'softmax', name = 'prediction'))

model.summary()

到这里，自己用Keras实现VGG16模型就完成了，接下来就是对模型进行训练！

不过，Kera的应用模块Application提供了带有预训练权重的Keras模型，这些模型可以用来进行预测、特征提取和finetune。

三. Keras.application中的VGG16模型

1. 模型信息

VGG16模型的权重由ImageNet训练而来。该模型在Theano和TensorFlow后端均可使用,并接受channels_first和channels_last两种输入维度顺序，模型的默认输入尺寸是224x224。

import keras

model = keras.applications.vgg16.VGG16(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)

参数

include_top：是否保留顶层的3个全连接网络
weights：None代表随机初始化，即不加载预训练权重。'imagenet'代表加载预训练权重
input_tensor：可填入Keras tensor作为模型的图像输出tensor
input_shape：可选，仅当include_top=False有效，应为长为3的tuple，指明输入图片的shape，图片的宽高必须大于48，如(200,200,3)

返回值

pooling：当include_top=False时，该参数指定了池化方式。None代表不池化，最后一个卷积层的输出为4D张量。‘avg’代表全局平均池化，‘max’代表全局最大值池化。
classes：可选，图片分类的类别数，仅当include_top=True并且不加载预训练权重时可用。

Keras 模型对象

2. 预训练权重

注意：模型参数.h5文件要放在/.keras/models下，如果该目录下没有文件，将会在上一步的载入模型时自动下载！

3. 模型预测

from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
import numpy as np
import time

t0 = time.time()

img = image.load_img('VGG_16_CAT.jpg', target_size = (224, 224))
x = image.img_to_array(img) # 三维（224，224，3）
x = np.expand_dims(x, axis = 0) # 四维（1，224，224，3）
x = preprocess_input(x) # 预处理
print(x.shape)
y_pred = model.predict(x)# 预测概率

t1 = time.time()

print("测试图：", decode_predictions(y_pred)) # 输出五个最高概率(类名, 语义概念, 预测概率)
print("耗时：", str((t1-t0)*1000), "ms")

(1, 224, 224, 3)
测试图： [[('n02123045', 'tabby', 0.73477006), ('n02124075', 'Egyptian_cat', 0.07941937), ('n02123159', 'tiger_cat', 0.07054488), ('n02883205', 'bow_tie', 0.019230891), ('n04553703', 'washbasin', 0.013854385)]]
耗时： 539.731502532959 ms

4. 使用Keras保存和恢复预训练的模型

一旦你利用Keras完成了训练，你可以将你的网络保存在HDF5里面。当然，你需要先安装 h5py。HDF5 格式非常适合存储大量的数字收，并从 numpy 处理这些数据。比如，我们可以轻松的将存储在磁盘上的多TB数据集进行切片，就好像他们是真正的 numpy 数组一样。你还可以将多个数据集存储在单个文件中，遍历他们或者查看 .shape 和 .dtype 属性。

保存权重

如果你要保存训练好的权重，那么你可以直接使用 save_weights 函数。

model.save_weights("my_model.h5")

载入预训练权重

model.load_weights("my_model_weights.h5")

参考文献：

Very Deep Convolutional Networks for Large-Scale Image Recognition

Keras中文文档

如何从零使用 Keras + TensorFlow 开发一个复杂深度学习模型