CNN网络架构学习：Chapter-2-AlexNet(附代码tensorflow)

2012年，Hinton的学生Alex提出了CNN模型AlexNet，AlexNet可以算是LeNet的一种更深更宽的版本。同年AlexNet以显著优势获得了IamgeNet的冠军,top-5错误率降低到了16.4%，相比于第二名26.2%的错误路有了巨大的提升，而AlexNet模型的参数量还不到第二名模型的二分之一。AlexNet可以说是神经网络在低谷期后第一次发威,确立了深度学习在计算机视觉领域的统治地位,同时也推动了深度学习在语音识别、自然语言处理、强化学习等领域的拓展。

王者归来：AlexNet

闪光点：

更深的网络
数据增广
ReLU
dropout
LRN

针对网络架构:
1.成功的使用ReLU作为激活函数,并验证其效果在较深的网络要优于Sigmoid.
2.使用LRN层，对局部神经元的活动创建竞争机制，使得其中响应比较大的值变的相对更大，并抑制其他反馈较小的神经元，增强了模型的泛化能力。
3.使用重叠的最大池化,论文中提出让步长比池化核的尺寸小，这样池化层的输出之间会有重叠和覆盖，提升了特征的丰富性。

针对过拟合现象:
1.数据增强，对原始图像随机的截取输入图片尺寸大小(以及对图像作水平翻转操作)，使用数据增强后大大减轻过拟合，提升模型的泛化能力。同时，论文中会对原始数据图片的RGB做PCA分析，并对主成分做一个标准差为 0.1的高斯扰动。
2.使用Dropout随机忽略一部分神经元，避免模型过拟合。
针对训练速度:

1.使用GPU计算，加快计算速度

以上图AlexNet架构为例，这个网络前面5层是卷积层，后面三层是全连接层，最终softmax输出是1000类，取其前两层进行详细说明。

AlexNet共包含5层卷积层和三层全连接层，层数比LeNet多了不少，但卷积神经网络总的流程并没有变化，只是在深度上加了不少。
AlexNet针对的是1000类的分类问题，输入图片规定是256×256的三通道彩色图片，为了增强模型的泛化能力，避免过拟合，作者使用了随机裁剪的思路对原来256×256的图像进行随机裁剪，得到尺寸为3×224×224的图像，输入到网络训练

因为使用多GPU训练，所以可以看到第一层卷积层后有两个完全一样的分支，以加速训练。
针对一个分支分析：第一层卷积层conv1的卷积核尺寸为11×11，滑动步长为4，卷积核数目为48。卷积后得到的输出矩阵为[48,55,55]。这里的55是个难以理解的数字，作者也没有对此说明，如果按照正常计算的话(224-11)/4+1 != 55的，所以这里是做了padding再做卷积的，即先padiing图像至227×227，再做卷积(227-11)/4+1 = 55。这些像素层经过relu1单元的处理，生成激活像素层，尺寸仍为2组48×55×55的像素层数据。然后经过归一化处理，归一化运算的尺度为5*5。第一卷积层运算结束后形成的像素层的规模为48×27×27。
输入矩阵是[48,55,55].接着是池化层，做max pooling操作，池化运算的尺度为3*3，运算的步长为2，则池化后图像的尺寸为(55-3)/2+1=27。所以得到的输出矩阵是[48,27,27]。

AlexNet用到训练技巧：

数据增广技巧来增加模型泛化能力。
用ReLU代替Sigmoid来加快SGD的收敛速度
Dropout:Dropout原理类似于浅层学习算法的中集成算法，该方法通过让全连接层的神经元（该模型在前两个全连接层引入Dropout）以一定的概率失去活性（比如0.5）失活的神经元不再参与前向和反向传播，相当于约有一半的神经元不再起作用。在测试的时候，让所有神经元的输出乘0.5。Dropout的引用，有效缓解了模型的过拟合。
Local Responce Normalization：局部响应归一层的基本思路是，假如这是网络的一块，比如是 13×13×256， LRN 要做的就是选取一个位置，比如说这样一个位置，从这个位置穿过整个通道，能得到 256 个数字，并进行归一化。进行局部响应归一化的动机是，对于这张 13×13 的图像中的每个位置来说，我们可能并不需要太多的高激活神经元。但是后来，很多研究者发现 LRN 起不到太大作用，因为并不重要，而且我们现在并不用 LRN 来训练网络

    models.tutorials.image.alexnet.alexnet_benchmark.py

# coding:utf8
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================



"""Timing benchmark for AlexNet inference.

To run, use:
  bazel run -c opt --config=cuda \
      models/tutorials/image/alexnet:alexnet_benchmark

Across 100 steps on batch size = 128.

Forward pass:
Run on Tesla K40c: 145 +/- 1.5 ms / batch
Run on Titan X:     70 +/- 0.1 ms / batch

Forward-backward pass:
Run on Tesla K40c: 480 +/- 48 ms / batch
Run on Titan X:    244 +/- 30 ms / batch
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
from datetime import datetime
import math
import sys
import time

from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

FLAGS = None


def print_activations(t):
  '''
    展示一个Tensor的name和shape
  :param t:
  :return:
  '''
  print(t.op.name, ' ', t.get_shape().as_list())



def inference(images):
  '''
   定义了AlexNet的前五个卷积层(FC计算速度较快,这里不做考虑)
  :param images: 输入图像Tensor
  :return:  返回最后一层pool5和parameters
  '''
  parameters = []

  # conv1
  # 使用name_scope可以将scope内创建的Variable命名conv1/xxx,便于区分不同卷积层的参数
  # 64个卷积核为11*11*3,步长为4,初始化权值为截断的正态分布(标注差为0.1)
  with tf.name_scope('conv1') as scope:
    kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(bias, name=scope)
    print_activations(conv1)
    parameters += [kernel, biases]

  # lrn1
  #
  with tf.name_scope('lrn1') as scope:
    lrn1 = tf.nn.lrn(conv1, alpha=1e-4, beta=0.75, depth_radius=2, bias=2.0)

  # pool1
  # 池化核3*3 步长为2*2
  pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')
  print_activations(pool1)

  # conv2
  # 192个卷积核为5*5*64,步长为1,初始化权值为截断的正态分布(标注差为0.1)
  with tf.name_scope('conv2') as scope:
    kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(bias, name=scope)
    parameters += [kernel, biases]
  print_activations(conv2)

  # lrn2
  with tf.name_scope('lrn2') as scope:
    lrn2 = tf.nn.lrn(conv2, alpha=1e-4, beta=0.75, depth_radius=2, bias=2.0)

  # pool2
  pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')
  print_activations(pool2)

  # conv3
  # 384个卷积核为3*3*192,步长为1,初始化权值为截断的正态分布(标注差为0.1)
  with tf.name_scope('conv3') as scope:
    kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384],
                                             dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv3 = tf.nn.relu(bias, name=scope)
    parameters += [kernel, biases]
    print_activations(conv3)

  # conv4
  # 256个卷积核为3*3*384,步长为1,初始化权值为截断的正态分布(标注差为0.1)
  with tf.name_scope('conv4') as scope:
    kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256],
                                             dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv4 = tf.nn.relu(bias, name=scope)
    parameters += [kernel, biases]
    print_activations(conv4)

  # conv5
  # 256个卷积核为3*3*256,步长为1,初始化权值为截断的正态分布(标注差为0.1)
  with tf.name_scope('conv5') as scope:
    kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256],
                                             dtype=tf.float32,
                                             stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv5 = tf.nn.relu(bias, name=scope)
    parameters += [kernel, biases]
    print_activations(conv5)

  # pool5
  pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')
  print_activations(pool5)

  return pool5, parameters


def time_tensorflow_run(session, target, info_string):
  '''
    用于评估AlexNet计算时间
  :param session:
  :param target:
  :param info_string:
  :return:
  '''
  num_steps_burn_in = 10  # 设备热身,存在显存加载/cache命中等问题
  total_duration = 0.0      # 总时间
  total_duration_squared = 0.0  # 用于计算方差

  for i in xrange(FLAGS.num_batches + num_steps_burn_in):
    start_time = time.time()
    _ = session.run(target)
    duration = time.time() - start_time
    if i >= num_steps_burn_in:
      if not i % 10:
        print ('%s: step %d, duration = %.3f' %
               (datetime.now(), i - num_steps_burn_in, duration))
      total_duration += duration
      total_duration_squared += duration * duration

  mn = total_duration / FLAGS.num_batches
  vr = total_duration_squared / FLAGS.num_batches - mn * mn
  sd = math.sqrt(vr)
  print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
         (datetime.now(), info_string, FLAGS.num_batches, mn, sd))



def run_benchmark():
  '''
    随机生成一张图片,
  :return:
  '''
  with tf.Graph().as_default():
    # Generate some dummy images.
    image_size = 224
    # Note that our padding definition is slightly different the cuda-convnet.
    # In order to force the model to start with the same activations sizes,
    # we add 3 to the image_size and employ VALID padding above.
    images = tf.Variable(tf.random_normal([FLAGS.batch_size,
                                           image_size,
                                           image_size, 3],
                                          dtype=tf.float32,
                                          stddev=1e-1))

    # Build a Graph that computes the logits predictions from the
    # inference model.
    pool5, parameters = inference(images)

    # Build an initialization operation.
    init = tf.global_variables_initializer()

    # Start running operations on the Graph.
    config = tf.ConfigProto()
    config.gpu_options.allocator_type = 'BFC'
    sess = tf.Session(config=config)
    sess.run(init)

    # Run the forward benchmark.
    time_tensorflow_run(sess, pool5, "Forward")

    # Add a simple objective so we can calculate the backward pass.
    objective = tf.nn.l2_loss(pool5)
    # Compute the gradient with respect to all the parameters.
    grad = tf.gradients(objective, parameters)  #计算梯度(objective与parameters有相关)
    # Run the backward benchmark.
    time_tensorflow_run(sess, grad, "Forward-backward")


def main(_):
  run_benchmark()


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--batch_size',
      type=int,
      default=128,
      help='Batch size.'
  )
  parser.add_argument(
      '--num_batches',
      type=int,
      default=100,
      help='Number of batches to run.'
  )
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

带LRN层的计算时间:
2017-07-28 20:01:13.982322: step 0, duration = 0.095
2017-07-28 20:01:14.926004: step 10, duration = 0.093
2017-07-28 20:01:15.873614: step 20, duration = 0.094
2017-07-28 20:01:16.820571: step 30, duration = 0.095
2017-07-28 20:01:17.770994: step 40, duration = 0.095
2017-07-28 20:01:18.717510: step 50, duration = 0.095
2017-07-28 20:01:19.664164: step 60, duration = 0.094
2017-07-28 20:01:20.614472: step 70, duration = 0.094
2017-07-28 20:01:21.556516: step 80, duration = 0.094
2017-07-28 20:01:22.497904: step 90, duration = 0.094
2017-07-28 20:01:23.360716: Forward across 100 steps, 0.095 +/- 0.001 sec / batch
2017-07-28 20:01:26.080152: step 0, duration = 0.216
2017-07-28 20:01:28.242078: step 10, duration = 0.216
2017-07-28 20:01:30.418645: step 20, duration = 0.217
2017-07-28 20:01:32.583144: step 30, duration = 0.216
2017-07-28 20:01:34.748482: step 40, duration = 0.216
2017-07-28 20:01:36.916634: step 50, duration = 0.216
2017-07-28 20:01:39.073233: step 60, duration = 0.215
2017-07-28 20:01:41.233626: step 70, duration = 0.217
2017-07-28 20:01:43.395616: step 80, duration = 0.216
2017-07-28 20:01:45.557092: step 90, duration = 0.216
2017-07-28 20:01:47.502201: Forward-backward across 100 steps, 0.216 +/- 0.001 sec / batch


不带LRN层的计算时间:
2017-07-28 20:03:44.466247: step 0, duration = 0.035
2017-07-28 20:03:44.812274: step 10, duration = 0.034
2017-07-28 20:03:45.158224: step 20, duration = 0.034
2017-07-28 20:03:45.503790: step 30, duration = 0.034
2017-07-28 20:03:45.849637: step 40, duration = 0.034
2017-07-28 20:03:46.195617: step 50, duration = 0.035
2017-07-28 20:03:46.541352: step 60, duration = 0.034
2017-07-28 20:03:46.886702: step 70, duration = 0.035
2017-07-28 20:03:47.232510: step 80, duration = 0.034
2017-07-28 20:03:47.576873: step 90, duration = 0.035
2017-07-28 20:03:47.886823: Forward across 100 steps, 0.035 +/- 0.000 sec / batch
2017-07-28 20:03:49.313215: step 0, duration = 0.099
2017-07-28 20:03:50.310755: step 10, duration = 0.100
2017-07-28 20:03:51.306087: step 20, duration = 0.099
2017-07-28 20:03:52.302013: step 30, duration = 0.100
2017-07-28 20:03:53.296832: step 40, duration = 0.100
2017-07-28 20:03:54.295764: step 50, duration = 0.100
2017-07-28 20:03:55.293681: step 60, duration = 0.100
2017-07-28 20:03:56.292695: step 70, duration = 0.100
2017-07-28 20:03:57.291794: step 80, duration = 0.100
2017-07-28 20:03:58.289415: step 90, duration = 0.100
2017-07-28 20:03:59.187312: Forward-backward across 100 steps, 0.100 +/- 0.000 sec / batch

参考资源：

https://www.cnblogs.com/skyfsm/p/8451834.html

https://blog.csdn.net/u011974639/article/details/76146822

蜡笔小新灬

发布了84 篇原创文章 · 获赞 108 · 访问量 3万+

私信关注

CNN网络架构学习：Chapter-2-AlexNet(附代码tensorflow)

猜你喜欢