resent–从Tensorflow代码中学习

网络结构解释

resnet的网络结构很好理解，可以将其视为多个小网络之间存在捷径连接（shortcut）。直觉的理解可以说，通过shortcut的连接，我们可以将上下层网络输出跨网络传递，从而在深层网络中保持信息的传递。

代码结构

我们学习Tensorflow/models/official/resnet. 主要包括以下三个代码：

cifar10_main.py
resnet_model.py
resnet_run_loop.py

该代码使用了tensorflow1.4中出现的Estimator和data等模块，其代码主要需要实现以下几个内容：

model()函数，输出logit
model_fn()函数，输出tf.estimator.EstimatorSpec()实例
input_fn()函数，输出(image, label)

其中model函数主要对模型层数、网络设计等进行实现，而model_fn中需要设置一些超参数，input_fn()主要对输入的数据进行预处理。

model()

model在实现时会被定义为一个类，模型的属性为model的参数，model实现的__call__特性则使得model成为可调用的函数，从而输出logit。
在__init__中，需对模型用到的参数进行赋值：

self.resnet_size = resnet_size # 源代码用resnet_size控制网络大小

self.resnet_version = version  # 选择实现两篇文章中的哪一个

self.data_format = data_format  # 'channels_last' or 'channels_first'

self.num_classes = num_classes  #

self.num_filters = num_filters  # 第一层conv的输出channel数

self.kernel_size = kernel_size  # 卷积核大小

self.conv_stride = conv_stride  # 卷积跳步长

self.first_pool_size = first_pool_size  # 两个pooling层的第一层：tf.layers.max_pooling对象，可以为None，代表不pooling

self.first_pool_stride = first_pool_stride  # pooling为None时无效。

self.second_pool_size = second_pool_size  # 貌似代码中没用？？

self.second_pool_stride = second_pool_stride  # 

self.block_sizes = block_sizes  # 是一个list，每个元素代码一个block_layer中的block数目

self.block_strides = block_strides  #  是一个list，表示每个block_layer中stride步长

self.final_size = final_size  # 实数，表示整个网络最后一个block_layer的输出的channel大小。注意要与num_filters&block_sizes共同决定。

在__call__中，实现了inference函数功能。该代码包括三个部分。第一部分是论文中第一层conv的实现，第二部分是多个block_layer的实现（for循环），第三部分是avg_pooling和logit的实现。
我们在写代码时，可以写一个子类继承这个基类，从而简化model的实现过程。主要工作就是设计一下参数而已。如：

class MstarModel(resnet_model.Model):
""""""
def __init__(self, resnet_size, data_format=None, num_classes=_NUM_CLASSES,
            version=resnet_model.DEFAULT_VERSION):
    if resnet_size % 6 != 2:
        raise ValueError("resnet_size must be 6n+2: ", resnet_size)

    num_blocks = (resnet_size - 2) // 6

    super(MstarModel, self).__init__(
        resnet_size=resnet_size,
        bottleneck=True,
        num_classes=num_classes,
        num_filters=16,
        kernel_size=3,
        conv_stride=1,
        first_pool_size=None,  # pool size to be used for the first pooling layer.
                            #  if None, the first pooling layer is skip.
        first_pool_stride=None,
        second_pool_size=2,
        second_pool_stride=1,
        block_sizes=[num_blocks] * 3,
        block_strides=[1, 2, 2],  # List of integers representing the desired stride size for each of the sets of block layers.
                                # Should be the same length as block_sizes.
        final_size=256,  # the expected size of the model after the second pooling.
        version=version,
        data_format=data_format  # ('channel_last', 'channel_first', None)
    )

其中一部分参数在init中进行赋值，起到固定子模型作用；而另一部分在实例化该模型类时给出（特定应用的模型仍需部分参数可调整）。

model_fn()

model_fn本来要做的工作是对不同mode实现不同的功能，返回tf.estimator.EstimatorSpec()实例。但是在这里，我们可以调用resnet_run_loop.resnet_model_fn模板来实现。model_fn中要给出一些超参数，同时对learning_rate, loss, optimizer等进行设置。
input_fn()

第一步要设置好tfrecord文件的路径，可通过get_filenane函数来实现。
第二步要实现对tfrecord文件的解析函数，即dataset.map()中的_parser，可通过自定义parse_dataset来实现。
同model_fn一样，input_fn可通过调用resnet_run_loop.process_record_dataset来实现。

参数设置

在main函数中，我们可以看到parser.set_defaults()中可以设置很多的参数。注意如果有新加的参数，如你需要指定一个新的validation数据集的目录，可以在resnet_run_loop.ResnetArgParser()类中使用self.add_argument()进行添加，如

class ResnetArgParser(argparse.ArgumentParser):
"""Arguments for configuring and running a Resnet Model."""

def __init__(self, resnet_size_choices=None):
    super(ResnetArgParser, self).__init__(parents=[
        parsers.BaseParser(),
        parsers.PerformanceParser(),
        parsers.ImageModelParser(),
        parsers.ExportParser(),
        parsers.BenchmarkParser(),
    ])

    self.add_argument(
        '--mode', type=str, default='train',
        choices=['train', 'eval', 'eval_1by1', 'test'],
        help="choose to train or validation or test"
    )

在model_fn中tf.estimator.Estimator()中的有5个参数，这5个参数用作模型超参数（有些是辅助参数），是需要设置好的。

丰富代码功能

源代码在训练的同时进行对测试集的评估，但是如果我们在训练以后想单独执行测试功能呢？这时我们需要丰富代码逻辑，在执行时判断代码应该训练还是验证还是测试。对此，我们可以进行如下修改。
首先如上所示，我们加入一个命令行参数--mode，其有三个可选值train, eval, test. 之后需在resnet_run_loop修改resnet_main函数：
将原来的代码段放置在if flags.mode == 'train':的scope下：

if flags.mode == 'train':
    for _ in range(flags.train_epochs // flags.epochs_between_evals):
        train_hooks = hooks_helper.get_train_hooks(
            flags.hooks,
            batch_size=flags.batch_size,
            benchmark_log_dir=flags.benchmark_log_dir)

        print('Starting a training cycle.')

        def input_fn_train():
            return input_function(True, flags.data_dir, flags.batch_size,
                                  flags.epochs_between_evals,
                                  flags.num_parallel_calls, flags.multi_gpu)

        classifier.train(input_fn=input_fn_train, hooks=train_hooks,
                         max_steps=flags.max_train_steps)

        print('Starting to evaluate.')

        # Evaluate the model and print results
        def input_fn_eval():
            return input_function(False, flags.data_dir, flags.batch_size,
                                  1, flags.num_parallel_calls, flags.multi_gpu)

        # flags.max_train_steps is generally associated with testing and profiling.
        # As a result it is frequently called with synthetic data, which will
        # iterate forever. Passing steps=flags.max_train_steps allows the eval
        # (which is generally unimportant in those circumstances) to terminate.
        # Note that eval will run for max_train_steps each loop, regardless of the
        # global_step count.
        eval_results = classifier.evaluate(input_fn=input_fn_eval,
                                           steps=flags.max_train_steps)
        print(eval_results)

        if benchmark_logger:
            benchmark_logger.log_estimator_evaluation_result(eval_results)

注意classifier的实现极其以上的代码段是在最外层scope下的。接着我们实现eval的scope：

if flags.mode == 'eval':

    def input_fn_eval():
        return input_function(False, flags.data_dir, flags.batch_size,
                              1, flags.num_parallel_calls, flags.multi_gpu)

    eval_result = classifier.evaluate(
        input_fn=input_fn_eval, steps=flags.max_train_steps)

    print(eval_result)

    if benchmark_logger:
        benchmark_logger.log_estimator_evaluation_result(eval_results)

这样的话我们就可以在代码调用时，通过以下命令

python **.py --mode train
python **.py --mode eval

来控制是要训练还是要测试了。

单独的一个功能，如何获取每一类的预测精度

MSTAR数据集相关论文中，总是会画一个表格，显示每一类预测的准确度。那么我们如何把现在实现中整个测试集扔到代码中输出精度值改为可以输出每一类的精度值呢？
方法很简单，只需要对不同类别单独制作一个tfrecord文件即可，同时制作时要注意对应的类标签和训练集是一致的。对于MSTAR数据集来说，得到10个tfrecord数据之后，我们可以用一个循环把他们分别读入进行测试，再存储对应的精度值：

if flags.mode == 'eval_1by1':
    # point to the eval dataset path

    def preprocess_image(image, is_training):
        if is_training:
            image = tf.image.resize_image_with_crop_or_pad(
                image, IMG_HEIGHT + 8, IMG_WIDTH + 8
            )

            image = tf.random_crop(image, [IMG_HEIGHT, IMG_WIDTH, IMG_CHANNEL])

        image = tf.image.per_image_standardization(image)

        return image

    def eval_input_fn(filename):
        if os.path.exists(filename):
            pass
        else:
            raise ValueError("not such file exists")

        def _parser(example_proto):
            features = {'label': tf.FixedLenFeature([], tf.int64),
                        'img_raw': tf.FixedLenFeature([], tf.string)}
            parsed_features = tf.parse_single_example(example_proto, features=features)
            image = tf.decode_raw(parsed_features['img_raw'], tf.uint8)
            image = tf.reshape(image, [IMG_HEIGHT, IMG_WIDTH, IMG_CHANNEL])
            # image = tf.cast(image, tf.float32) * (1. / 255) - 0.5
            image = preprocess_image(image, False)
            label = tf.cast(parsed_features['label'], tf.int32)
            label = tf.one_hot(label, NUM_CLASS)
            return image, label

        dataset = tf.data.TFRecordDataset(filename)
        dataset = dataset.map(_parser)

        dataset = dataset.batch(BATCH_SIZE)
        iterator = dataset.make_one_shot_iterator()
        next_image, next_label = iterator.get_next()

        return next_image, next_label

    eval_dic = {
        0: '2S1',
        1: 'BMP2',
        2: 'BRDM_2',
        3: 'BTR60',
        4: 'BTR70',
        5: 'D7',
        6: 'T62',
        7: 'T72',
        8: 'ZIL131',
        9: 'ZSU_23_4'
    }
    time_now = datetime.datetime.now()
    with open("validation accuracy.txt", 'a') as f:
        f.writelines(str(time_now) + ':\n')
    for i in range(10):
        eval_result = classifier.evaluate(
            input_fn=lambda: eval_input_fn(
                os.path.join(flags.eval_data_dir,  'MSTAR10_test_' + eval_dic[i] + '.tfrecords')),
            steps=flags.max_train_steps)

        accuracy_score = eval_result['accuracy']
        print("\nTest Accuracy: {0:f}\n".format(accuracy_score))
        with open("validation accuracy.txt", 'a') as f:
            f.writelines("{} 's accuracy: {}".format(eval_dic[i], accuracy_score))
            f.writelines("\n")

要注意的一点在于，这里使用的图像预处理方法一定要与训练时一致，不然得到的预测结果会很差很差（真的很差）。

resnet_看懂网络与能用代码