[ MOOC课程学习 ] 人工智能实践：Tensorflow笔记_CH8_1复现已有的卷积神经网络

VGG 论文阅读笔记

2015年发表于ICLR的《VERY DEEP CONVOLUTIONAL NETWORKS FORLARGE-SCALE IMAGE RECOGNITION》

卷积配置
为凸显深度对模型效果的影响,我们所有卷积采用相同配置。本章先介绍卷积网络的通用架构,再描述其评估中的具体细节,最后讨论我们的设计选择以及前人网络的比较。
1.1 架构
(1) 训练输入:固定尺寸224*224的RGB图像。
(2) 预处理:每个像素值减去训练集上的RGB均值。
(3) 卷积核:一系列3*3卷积核堆叠,步长为1,采用padding保持卷积后图像空间分辨率不变。
(4) 空间池化:紧随卷积“堆”的最大池化,为2*2滑动窗口,步长为2。
(5) 全连接层:特征提取完成后,接三个全连接层,前两个为4096通道,第三个为1000通道,最后是一个soft-max层,输出概率。
(6) 所有隐藏层都用非线性修正ReLu。
1.2 详细配置
(1) 表1中每列代表不同的网络,只有深度不同(层数计算不包含池化层)。卷积的通道数量很小,第一层仅64通道,每经过一次最大池化,通道数翻倍,知道数量达到512通道。

(2) 表2展示了每种模型的参数数量,尽管网络加深,但权重并未大幅增加,因为参数量主要集中在全连接层。

每种模型的参数数量计算:
1.3 讨论
为什么?两个 $3\times3$ 卷积核相当于一个 $5\times5$ 卷积核的感受域,三个 $3\times3$ 卷积核相当于一个 $7\times7$ 卷积核的感受域。
优点:三个卷积堆叠具有三个非线性修正层,使模型更具判别性;其次三个 $3\times3$ 卷积参数量更少,相当于在 $7\times7$ 卷积核上加入了正则化。
分类框架
2.1 训练
训练方法基本与AlexNet一致,除了多尺度训练图像采样方法不一致。
训练采用mini-batch梯度下降法,batch size=256;
采用动量优化算法,momentum=0.9;
采用L2正则化方法,惩罚系数0.00005;
dropout比率设为0.5;
初始学习率为0.001,当验证集准确率不再提高时,学习率衰减为原来的0.1倍,总共下降三次;
总迭代次数为370K(74epochs);
数据增强采用随机裁剪,水平翻转,RGB颜色变化;
设置训练图片大小的两种方法: 定义S代表经过各向同性缩放的训练图像的最小边。第一种方法针对单尺寸图像训练,S=256或384,输入图片从中随机裁剪224*224大小的图片,原则上S可以取任意不小于224的值。第二种方法是多尺度训练,每张图像单独从[S min ,S max ]中随机选取S来进行尺寸缩放,由于图像中目标物体尺寸不定,因此训练中采用这种方法是有效的,可看作一种尺寸抖动的训练集数据增强。
论文中提到,网络权重的初始化非常重要,由于深度网络梯度的不稳定性,不合适的初始化会阻碍网络的学习。因此我们先训练浅层网络,再用训练好的浅层网络去初始化深层网络。
2.2 测试
测试阶段,对于已训练好的卷积网络和一张输入图像,采用以下方法分类:
首先,图像的最小边被各向同性的缩放到预定尺寸Q;
然后,将原先的全连接层改换成卷积层,在未裁剪的全图像上运用卷积网络,输出是一个与输入图像尺寸相关的分类得分图,输出通道数与类别数相同;
最后,对分类得分图进行空间平均化,得到固定尺寸的分类得分向量。
我们同样对测试集做数据增强,采用水平翻转,最终取原始图像和翻转图像的soft-max分类概率的平均值作为最终得分。由于测试阶段采用全卷积网络,无需对输入图像进行裁剪,相对于多重裁剪效率会更高。但多重裁剪评估和运用全卷积的密集评估是互补的,有助于性能提升。
分类实验
3.1单尺寸评估
表3展示单一测试尺寸上的卷积网络性能

3.2多尺寸评估
表4展示多个测试尺寸上的卷积网络性能

3.3 多重裁剪与密集网络评估
表 5 展示多重裁剪与密集网络对比,并展示两者相融合的效果

3.4 卷积模型的融合
这部分探讨不同模型融合的性能,计算多个模型的 soft-max 分类概率的平均值来对它们的输出进行组合,由于模型的互补性,性能有所提高,这也用于比赛的最佳结果中。
表 6 展示多个卷积网络融合的结果

3.5 与当前最好算法的比较
表七展示对当前最好算法的对比
结论
本文评估了非常深的卷积网络在大规模图像分类上的性能。结果表明深度有利于分类准确率的提升。附录中展示了模型的泛化能力,再次确认了视觉表达中深度的重要性。

VGG 实现代码重点讲解

item(): 遍历(键值对)。class ndarray的一个方法

tf.shape(a) 和 a.get_shape()比较
相同点:都可以得到 tensor a 的尺寸
不同点:
(1) tf.shape() 中 a 的数据类型可以是 tensor, list, array,并且返回的是 tensor

import tensorflow as tf
import numpy as np
x=tf.constant([[1,2,3],[4,5,6]])
y=[[1,2,3],[4,5,6]]
z=np.arange(24).reshape([2,3,4])
sess=tf.Session()

x_shape=tf.shape(x)
y_shape=tf.shape(y)
z_shape=tf.shape(z)

print(x_shape) # Tensor("Shape:0", shape=(2,), dtype=int32)
print(y_shape) # Tensor("Shape_1:0", shape=(2,), dtype=int32)
print(z_shape) # Tensor("Shape_2:0", shape=(3,), dtype=int32)
print(sess.run(x_shape)) # [2 3]
print(sess.run(y_shape)) # [2 3]
print(sess.run(z_shape)) # [2 3 4]

(2) a.get_shape()中 a 的数据类型只能是 tensor,且返回的是一个元组(tuple)。

import tensorflow as tf
import numpy as np
x=tf.constant([[1,2,3],[4,5,6]])
y=[[1,2,3],[4,5,6]]
z=np.arange(24).reshape([2,3,4])
sess=tf.Session()

x_shape=x.get_shape()
print(x_shape) # (2, 3)

y_shape=y.get_shape()
print(y_shape) # AttributeError: 'list' object has no attribute 'get_shape'

z_shape=z.get_shape()
print(z_shape) # AttributeError: 'numpy.ndarray' object has no attribute 'get_shape'

tf.nn.bias_add(乘加和,bias):把 bias 加到乘加和上。
np.argsort(列表):对列表从小到大排序。
os.getcwd():返回当前工作目录。
np.save("name.npy",某数组):将某数组写入“name.npy”文件,如果文件路径末尾没有扩展名.npy,该扩展名会被自动加上。某变量 = np.load("name.npy",encoding = " ").item():将“name.npy”文件读出给某变量。encoding = ” “可以不写‘latin1’、‘ASCII’、‘bytes’,默认为’ASCII’。

tf.split(value, num_or_size_splits, axis=0, num=None, name="split")

def split(value, num_or_size_splits, axis=0, num=None, name="split"):
  """Splits a tensor into sub tensors.

  If `num_or_size_splits` is an integer type, `num_split`, then splits `value`
  along dimension `axis` into `num_split` smaller tensors.
  Requires that `num_split` evenly divides `value.shape[axis]`.

  If `num_or_size_splits` is not an integer type, it is presumed to be a Tensor
  `size_splits`, then splits `value` into `len(size_splits)` pieces. The shape
  of the `i`-th piece has the same size as the `value` except along dimension
  `axis` where the size is `size_splits[i]`.

  For example:

  ```python
  # 'value' is a tensor with shape [5, 30]
  # Split 'value' into 3 tensors with sizes [4, 15, 11] along dimension 1
  split0, split1, split2 = tf.split(value, [4, 15, 11], 1)
  tf.shape(split0)  # [5, 4]
  tf.shape(split1)  # [5, 15]
  tf.shape(split2)  # [5, 11]
  # Split 'value' into 3 tensors along dimension 1
  split0, split1, split2 = tf.split(value, num_or_size_splits=3, axis=1)
  tf.shape(split0)  # [5, 10]
  ```

  Args:
    value: The `Tensor` to split.
    num_or_size_splits: Either a 0-D integer `Tensor` indicating the number of
      splits along split_dim or a 1-D integer `Tensor` integer tensor containing
      the sizes of each output tensor along split_dim. If a scalar then it must
      evenly divide `value.shape[axis]`; otherwise the sum of sizes along the
      split dimension must match that of the `value`.
    axis: A 0-D `int32` `Tensor`. The dimension along which to split.
      Must be in the range `[-rank(value), rank(value))`. Defaults to 0.
    num: Optional, used to specify the number of outputs when it cannot be
      inferred from the shape of `size_splits`.
    name: A name for the operation (optional).

  Returns:
    if `num_or_size_splits` is a scalar returns `num_or_size_splits` `Tensor`
    objects; if `num_or_size_splits` is a 1-D Tensor returns
    `num_or_size_splits.get_shape[0]` `Tensor` objects resulting from splitting
    `value`.

  Raises:
    ValueError: If `num` is unspecified and cannot be inferred.
  """

tf.concat(values, axis, name="concat")

def concat(values, axis, name="concat"):
  """Concatenates tensors along one dimension.

  Concatenates the list of tensors `values` along dimension `axis`.  If
  `values[i].shape = [D0, D1, ... Daxis(i), ...Dn]`, the concatenated
  result has shape

      [D0, D1, ... Raxis, ...Dn]

  where

      Raxis = sum(Daxis(i))

  That is, the data from the input tensors is joined along the `axis`
  dimension.

  The number of dimensions of the input tensors must match, and all dimensions
  except `axis` must be equal.

  For example:

  ```python
  t1 = [[1, 2, 3], [4, 5, 6]]
  t2 = [[7, 8, 9], [10, 11, 12]]
  tf.concat([t1, t2], 0)  # [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
  tf.concat([t1, t2], 1)  # [[1, 2, 3, 7, 8, 9], [4, 5, 6, 10, 11, 12]]

  # tensor t3 with shape [2, 3]
  # tensor t4 with shape [2, 3]
  tf.shape(tf.concat([t3, t4], 0))  # [4, 3]
  tf.shape(tf.concat([t3, t4], 1))  # [2, 6]
  ```

  Note: If you are concatenating along a new axis consider using stack.
  E.g.

  ```python
  tf.concat([tf.expand_dims(t, axis) for t in tensors], axis)
  ```

  can be rewritten as

  ```python
  tf.stack(tensors, axis=axis)
  ```

  Args:
    values: A list of `Tensor` objects or a single `Tensor`.
    axis: 0-D `int32` `Tensor`.  Dimension along which to concatenate. Must be
      in the range `[-rank(values), rank(values))`.
    name: A name for the operation (optional).

  Returns:
    A `Tensor` resulting from concatenation of the input tensors.
  """

绘图


# 绘图

fig = plt.figure("图名字") # 实例化图对象。
ax = fig.add_subplot(mnk) # 将画布分割成 m 行 n 列,图像画在从左到右从上到下的第 k 块。
ax.plot(x,y)
plt.show()

axo = imshow(图) # 画子图
img = io.imread(图片路径) # 读入图片

VGG源码包含的文件

app.py 读入待判图,给出可视化结果
vgg16.py 还原网络和参数
utils.py 一些辅助函数,包括
- 读入图片
- 计算百分比形式的概率
Nclasses.py 含labels字典
vgg16.npy 包含了神经网络的所有参数 (Table1里编号为D的网络结构)

[ MOOC课程学习 ] 人工智能实践：Tensorflow笔记_CH8_1复现已有的卷积神经网络

VGG 论文阅读笔记

VGG 实现代码重点讲解

VGG源码包含的文件

猜你喜欢