【Tensorflow】深度学习实战06——Tensorflow实现ResNet

前言

ResNet（Residual Neural Network）由前微软研究院的 Kaiming He 等4名华人提出（有兴趣的可以点击这里，查看论文原文），通过使用 Residual Blocks 成功训练152层深的神经网络，在 ILSVRC 2015 比赛中获得了冠军，取得 3.57% 的 top-5 错误率，同时参数量却比 VGGNet 低，效果非常突出。ResNet 的结构可以极快地加速超深神经网络的训练，模型的准确率也有非常大的提升。上一篇博文讲解了 Inception，而 Inception V4 则是将 Inception Module 和 ResNet 相结合。可以看到 ResNet 是一个推广性非常好的网络结构，甚至可以直接应用到 Inception Net 中。

网络原理

在ResNet提出之前，瑞士的Schmidhuber教授提出了Highway Network，原理与ResNet相似。Highway Network的目标就是为了解决极深的神经网络难以训练的问题，因为，通常看来神经网络越深其性能越好，但是网络越深其训练难度也就越大。Highway Network相当于修改了每一层的激活函数，此前的激活函数只是对输入做一个线性变换 $y=H(x,W_{H})$ ，Highway Network则允许保留一定比例的原始输入x，即 $y=H(x,W_{H})*T(x,W_{T})+x*C(x,W_{C})$ 。其中T为变换稀疏，C为保留系数。这样有一定比例的前一层信息不经过矩阵变换和非线性变换，直接传输到下一层，仿佛一层高速公路，因此命名为Highway Network。几百乃至上千层的Highway Network可以直接使用梯度下降算法训练，并且可以配合多种非线性激活函数，使得学习极深的神经网络也变得可行。Highway Network在理论上是可以训练任意深度的网络的，而传统的神经网络结构则对深度十分敏感。
ResNet与Highway Network相似，也是允许前面的信息直接传递都后面的层中。而ResNet最初的灵感来自于这个问题：随着网络不断地加深，会出现一个Degradation的问题，即准确率会上升然后达到饱和，继续持续增加网络深度会导致准确率下降。如下图所示；

试想，当一个网络已经达到了饱和的准确率，那么再在后面加上几个 $y=x$ 的全等映射层起码不会增加错误。这里提到的全等映射的思想就是ResNet的灵感来源。假设某段网络的输入是 $x$ ，期望输出是 $H(x)$ ，如果直接把输入作为输出的初始结果，那么此时网络中训练的就是两者之差 $F(x)=H(x)-x$ 。也就是说残差学习网络学习的不再是一个完整的输出，而是输出对输入做差值 $F(x)$ 。下图是VGG-19、直连的34层网络、ResNet的34层结构的对比。

下图为ResNet中两种残差学习单元的结构，其基础结构相近，两层的残差学习单元中包含了两个相同输出同单数的3x3卷积层；3层学习单元则使用了Inception Net中的1x1的卷积层，并且在中间3x3的卷积前后都使用了1x1卷积。需要注意的是，如果在卷积层中改变了Tensor的维度，需要对输入做一个线性的维度变换，保证能够衔接到后层。（在下方代码中，实现这一功能的函数为conv2d_same）

下图是论文中提到的ResNet不同层数时的网络结构。在使用了ResNet的结构后，Degradation的现象被消除了，而且训练误差会随着层数的增大而减小，在测试集上的表现也会变好。在ResNet提出后不久，Google借鉴了ResNet的精髓，提出了Inception V4和Inception-ResNet-V2，通过融合这两个模型，在ILSVRC数据集上取得了惊人的3.08%的错误率。可见，ResNet 及其思想对卷积神经网络研究的贡献确实非常显著，具有很强的推广性。

为什么残差神经网络能有这么好的效果？

上面提到，当网络参数快要达到稳定时，残差学习单元中的shortcut（即跳跃连接）部分的作用显现出来了。它将网络前方的输入直接传递至后方（相当于卷积层的部分对于 $H（x）$ 的贡献相当小了，残差学习单元此时近似于一个恒等函数），保证了网络结构的稳定。如果，残差学习单元中的卷积层部分有学到一些有用的内容，那么残差学习单元的效果恒等函数更好，网络的性能得到了提升。
在训练的时候，残差学习单元中的shortcut是恒等函数，它学习起来非常容易，并不会影响网络训练的效率。

Tensorflow实现网络

根据《Tensorflow 实战》的内容，我实现了一下152层深度的ResNet。代码中有备注，可以供大家参考。

# -*- coding: utf-8 -*-
"""
Created on Tue Jul  3 09:46:34 2018

@author: most_pan
"""

import collections
import time 
from datetime import datetime
import math

import tensorflow as tf
slim =tf.contrib.slim


class Block(collections.namedtuple('Block',['scope','unit_fn','args'])):
    'A named tuple describing a ResNet block'
#unit_fn是残差学习元生成函数
#args是一个长度等于Block中单元数目的序列，序列中的每个元素
#包含第三层通道数，前两层通道数以及中间层步长(depth, depth_bottleneck, stride)三个量
#在定义一个Block类的对象时，需要提供三个信息，分别是scope，残差学习单元生成函数，以及参数列表



#降采样函数    
def subsample(inputs,factor,scope=None):
    if factor==1:
        return inputs
    else:
        return slim.max_pool2d(inputs,[1,1],stride=factor,scope=scope)

##用于保证维度一致
def conv2d_same(inputs,num_outputs,kernel_size,stride,scope=None):
    if stride==1:
        return slim.conv2d(inputs,num_outputs,kernel_size,stride=1,padding='SAME',scope=scope)
    else:
        pad_total=kernel_size - 1
        pad_beg=pad_total // 2
        pad_end=pad_total-pad_beg
#使用tf.pad对图像进行填充
        inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
                        padding='VALID', scope=scope)

@slim.add_arg_scope
def stack_blocks_dense(net,blocks,outputs_collections=None):

    for block in blocks:
        with tf.variable_scope(block.scope,'block',[net])as sc:
            for i,unit in enumerate(block.args):
                with tf.variable_scope('unit_%d'%(i+1),values=[net]):
                    unit_depth,unit_depth_bottleneck,unit_stride=unit    #获取每个Block中的参数，包括第三层通道数，前两层通道数以及中间层步长
#unit_fn是Block类的残差神经元生成函数，它按顺序创建残差学习元并进行连接
                    net=block.unit_fn(net,
                                      depth=unit_depth,
                                      depth_bottleneck=unit_depth_bottleneck,
                                      stride=unit_stride)
            net=slim.utils.collect_named_outputs(outputs_collections,sc.name,net)
    return net


def resnet_arg_scope(is_training=True,
                     weight_decay=0.0001,
                     batch_norm_decay=0.997,
                     batch_norm_epsilon=1e-5,
                     batch_norm_scale=True):
    batch_norm_params = {
      'decay': batch_norm_decay,
      'epsilon': batch_norm_epsilon,
      'scale': batch_norm_scale,
      'updates_collections': tf.GraphKeys.UPDATE_OPS,
      'fused': None,  # Use fused batch norm if possible.
  }

    with slim.arg_scope(
      [slim.conv2d],
      weights_regularizer=slim.l2_regularizer(weight_decay),
      weights_initializer=slim.variance_scaling_initializer(),
      activation_fn=tf.nn.relu,
      normalizer_fn=slim.batch_norm,
      normalizer_params=batch_norm_params):
        with slim.arg_scope([slim.batch_norm], **batch_norm_params):
          with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
            return arg_sc


#利用add_arg_scope使bottleneck函数能够直接使用slim.arg_scope设置默认参数
@slim.add_arg_scope
def bottleneck(inputs,depth,depth_bottleneck,stride,
               outputs_collections=None,scope=None):
    with tf.variable_scope(scope,'bottleneck_v2',[inputs])as sc:
#获取输入的通道数目
        depth_in=slim.utils.last_dimension(inputs.get_shape(),min_rank=4)
#先对输入进行batch_norm，再进行非线性激活
        preact=slim.batch_norm(inputs,activation_fn=tf.nn.relu,scope='preact')

#如果残差神经元的输出通道数目和输入的通道数目相同，那么直接对图像进行降采样，以保证shortcut尺寸和经历三个卷积层后的输出的此存相同        
        if depth==depth_in:
            shortcut=subsample(inputs,stride,'shortcut')
#如果残差神经元的输出通道数目和输入的通道数目不同，利用尺寸为1x1的卷积核对输入进行卷积，使输入通道数相同；
        else:
            shortcut=slim.conv2d(preact,depth,[1,1],stride=stride,normalizer_fn=None,
                                 activation_fn=None,scope='shortcut')
#然后，定义三个卷积层           
        residual=slim.conv2d(preact,depth_bottleneck,[1,1],stride=1,scope='conv1')
        residual=conv2d_same(residual,depth_bottleneck,3,stride,scope='conv2')
        residual=slim.conv2d(residual,depth,[1,1],stride=1,normalizer_fn=None,activation_fn=None,scope='conv3')

#将shortcut和residual相加，作为输出        
        output=shortcut+residual

        return slim.utils.collect_named_outputs(outputs_collections,sc.name,output)

#input是输入，blocks包含残差学习元的参数，num_classes是输出分类数，global_pool是是否进行平均池化的标志位；      
def resnet_v2(inputs,
              blocks,
              num_classes=None,
              global_pool=True,
              include_root_block=True,
              reuse=None,
              scope=None):
    with tf.variable_scope(scope,'resnet_v2',[inputs],reuse=reuse) as sc:
        end_points_collection=sc.original_name_scope+'_end_points'
        with slim.arg_scope([slim.conv2d,bottleneck,stack_blocks_dense],
                            outputs_collections=end_points_collection):
            net=inputs
            if include_root_block:
                with slim.arg_scope([slim.conv2d],activation_fn=None,normalizer_fn=None):
#卷积核为7x7步长为2的卷积层
                    net=conv2d_same(net,64,7,stride=2,scope='conv1')
#最大值池化
                net=slim.max_pool2d(net,[3,3],stride=2,scope='pool1')
#调用stack_blocks_dense堆叠50个残差学习元，每个有三个卷积层
            net=stack_blocks_dense(net,blocks)
#先做batch norm然后使用relu激活
            net=slim.batch_norm(net,activation_fn=tf.nn.relu,scope='postnorm')
            if global_pool:     #进行平均池化
                net=tf.reduce_mean(net,[1,2],name='pool5',keep_dims=True)
#一个输出为num_classes的卷积层，不进行激活也不归一正则化。
            if num_classes is not None:
                net=slim.conv2d(net,num_classes,[1,1],activation_fn=None,normalizer_fn=None,scope='logits')
            end_points=slim.utils.convert_collection_to_dict(end_points_collection)

#使用softmax进行分类         
            if num_classes is not None:
                end_points['predictions']=slim.softmax(net,scope='predictions')
            return net,end_points

#152层残差网络   
def resnet_v2_152(inputs,
                  num_classes=None,
                  global_pool=True,
                  reuse=None,
                  scope='resnet_v2_152'):
    blocks=[Block('block1',bottleneck,[(256,64,1)]*2+[(256,64,2)]),
            Block('block2',bottleneck,[(512,128,1)]*7+[(512,128,2)]),
            Block('block3',bottleneck,[(1024,256,1)]*35+[(1024,256,2)]),
            Block('block4',bottleneck,[(2048,512,1)]*3)]

    return resnet_v2(inputs,blocks,num_classes,global_pool,
                     include_root_block=True,reuse=reuse,scope=scope)


#测试性能定义的函数
def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0

    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print('%s: step %d, duration = %.3f' %(datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration

    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %(datetime.now(), info_string, num_batches, mn, sd))



batch_size,height,width=32,224,224

inputs=tf.random_uniform((batch_size,height,width,3))
with slim.arg_scope(resnet_arg_scope(is_training=False)):
    net,end_points=resnet_v2_152(inputs,1000)


init=tf.global_variables_initializer()
sess=tf.Session()
sess.run(init)
num_batches=100
time_tensorflow_run(sess,net,"ForWard")

代码分析

上面代码中，resnet_v2_152函数用于设置网络中的参数，它主要是生成了一个变量blocks，并传递给resnet_v2函数。blocks是一个数组，每个元素都是一个Block类对象，每个对象有三个属性，分别是scope，残差学习单元生成函数bottleneck，以及残差学习单元的参数列表。bottleneck根据列表中的内容生成残差学习单元。其它函数的说明在代码中都有标注，大家可以自行理解。

运行结果

已完。。

参考书籍

《Tensorflow 实战》黄文坚等著；