卷积神经网络：CIFAR-10训练和测试（单块GPU）

正好闲暇，将前段时间搁浅的官方cifar-10的例子运行了一遍。我只有一个GPU，所以并没有完全照搬官方的代码，而是在此基础上，结合自己的理解，对代码进行了修改，总共有三个子文件，每个文件具体作用如下表所示。

文件	作用
cifar10_input.py	读取本地CIFAR-10的二进制文件格式，定义函数distorted_inputs获得训练数据和inputs函数获取测试数据
cifar10.py	建立卷积神经模型，定义损失函数，训练器和正确率计算等函数
training.py	训练CIFAR-10和评估CIFAR-10模型

完整代码见：https://github.com/skloisMary/CIFAR-10.git

CIFAR数据集

cifar-10是由Hinton的两个大弟子Alex Krizhevsky和IIya Sutskever收集的一个用于普适物体识别的数据集。该数据集共有60000张彩色图像，图像大小为32*32*3，分为10个类别，每个类别6000张图片。其中有50000张用于训练，构成5个训练批次，每一个批次共10000张图，余下10000张用于测试，单独构成一批次。

模型输入

CIFAR数据量较大，使用从文件中读取数据的方式，如果对此不了解，请查看我的博客文章tensorflow的三种数据输入。由于要在训练集上进行训练，在验证集上进行验证，评估模型，所以就使用了Multiple input pipelines方式，使用两个进程，建立两个独立的图和session

训练进程读取训练数据，并且周期性地将模型训练好的变量保存到checkpoint文件中去。
评估进程从checkpoint文件中恢复得到一个inference模型，这个模型读取评估数据。

cifar10_input.py中定义distorted_inputs函数获取训练数据。首先对图片随机剪裁为24*24像素大小，并对图像进行随机的左右翻转，随机变换图像的亮度，随机变换图像的对比度等操作来扩增数据集的大小，最后对图片进行白化处理，使得模型对图片的动态范围变化不敏感。

cifar10_input.py中定义inputs函数获取测试数据，对数据剪裁中央区域得到24*24像素大小，并进行白化操作。此外，与distorted_inputs函数不同的还包括未使用乱序进行批处理。

# 获取训练数据
def distorted_inputs(data_dir, batch_size):
    """对cifar训练集中的image数据进行变换，图像预处理
    param data_dir: 数据所处文件夹名称
    param batch_size: 批次大小
    return:
           images: 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3]
           labels: 1D tensor of [batch_size] size
    """
    filename = [os.path.join(data_dir, 'data_batch_%d.bin' % i) for i in range(1, 6)]
    for f in filename:
        if not tf.gfile.Exists(f):
            raise ValueError('Failed to find file: ' + f)

    filename_queue = tf.train.string_input_producer(filename)

    # 数据扩增
    with tf.name_scope('data_augmentation'):
        read_input = read_cifar10(filename_queue)
        reshaped_image = tf.cast(read_input.uint8image, tf.float32)

        height = IMAGE_SIZE
        width = IMAGE_SIZE

        # tf.random_crop 对输入图像进行随意裁剪
        distored_image = tf.random_crop(reshaped_image, [height, width, 3])
        # tf.image.random_flip_left_right 随机左右翻转图片
        distored_image = tf.image.random_flip_left_right(distored_image)
        # tf.image.random_brightness在某范围随机调整图片亮度
        distored_image = tf.image.random_brightness(distored_image, max_delta=63)
        # tf.image.random_contrast 在某范围随机调整图片对比度
        distored_image = tf.image.random_contrast(distored_image, lower=0.2, upper=1.8)
        #归一化， 三维矩阵中的数字均值为0，方差为1， 白化操作
        float_image = tf.image.per_image_standardization(distored_image)

        float_image.set_shape([height, width, 3])
        read_input.label.set_shape([1])

        min_fraction_of_examples_in_queue = 0.4
        min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN * min_fraction_of_examples_in_queue)
    image_batch, label_batch = tf.train.shuffle_batch([float_image, read_input.label], batch_size=batch_size,
                                         capacity= min_queue_examples + 3 * batch_size,
                                         min_after_dequeue=min_queue_examples)
    tf.summary.image('image_batch_train', image_batch)
    return image_batch, tf.reshape(label_batch, [batch_size])

模型架构和训练

模型的架构为conv1-->pooling1-->norm1-->conv2-->pooling_2-->norm_2-->local3-->local4-->softmax_linear，为一个多层架构，由卷积层和非线性层交替多次排列后构成。定义在cifar10.py中的inference()函数中，输入是一批次的图片，格式为[batch_size, height, width, 3]，输出是softmax_linear的线性返回值logits，格式为[batch_size, 10]。由于CIFAR-10中的图片被标记为唯一的一个标签，而不是常见的one-hot形式，损失定义损失函数时，使用tf.nn.sparse_softmax_cross_entropy_with_logits(logits, lables)函数。

在定义学习率时，采用灵活的指数衰减法，其核心思想是先用较大的学习率快速得到一个较优的解，随着迭代步数的增多减少学习率，使得模型更加稳定。函数形式如下所示：

tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=True/False)
其中learning_rate为初始的学习率，global_step是当前全局的迭代步数，decay_steps:为每次迭代时需要经过多少步数，decay_rate是衰减比例，以及staircase表示是否呈现阶梯状衰减。计算公式如下：

$decayedLearningRate = learningRate * decayRate^{\left ( globalStep / decaySteps) \right )}$

staircase默认是False，即每一次迭代都要重新计算学习率，然而当staircase为True时，global_steps / decay_steps的值会被转化成整数，所以当global_steps / decay_steps整除时，学习率才会改变，故成阶梯状。

def loss(logits, labels):
    with tf.name_scope('loss'):
        labels = tf.cast(labels, tf.int64)
        # logits通常是神经网络最后连接层的输出结果，labels是具体哪一类的标签
        # 这个函数是直接使用标签数据的，而不是采用one-hot编码形式
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
            logits=logits, labels=labels, name='cross_entropy')
        cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy_mean')
        return cross_entropy_mean

def train_step(loss_value, global_step):
    learning_rate = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
                                               global_step=global_step, decay_steps=DECAY_STEP,
                                               decay_rate=DECAY_RATE, staircase=True)
    tf.summary.scalar('learning_rate', learning_rate)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss_value)
    return optimizer

在一块GPU上进过两三个小时，50000次迭代训练，模型的预测精度大约在85%-90%之间徘徊。显示在tensorboard中的每一步的准确度如下图所示。

将模型训练好的变量保存到checkpoint文件中去，以便测试程序能够恢复出inference模型，喂入测试数据，评估模型性能。

模型评估

因为我仅仅在最后保存了模型中的参数，使用saver.restore函数恢复变量，利用inference函数重构模型，喂入测试数据，输出测试数据的计算记过logits。接下来使用tf.nn.in_top_k(eval_logits, eval_lables, 1)进行准确度计算，此函数的作用是计算每个样本预测的结果前K个最大的数里面和实际结果的是否相等（预测结果大小[batch_size, 10]，真实标签大小为[128]），返回一个bool类型的张量，k一般取1。之所以不使用和训练模型时的accuracy_value=session.run(accuracy)，是因为如上述所言，训练和评估是两个线程，所以不能串用，如果串用，准确率大概是10%左右，就相当于瞎猜嘛，训练得到的变量根本没起到任何作用。最后评估得出的准确度是82.1%。

def evaluation():
    with tf.Graph().as_default():
        n_test = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
        eval_images, eval_lables = cifar10_input.inputs(DATA_DIR, BATCH_SIZE)
        eval_logits = cifar10.inference(eval_images)
        # tf.nn.in_top_k(predictions, targets, k, name=None)
        # 每个样本的预测结果的前k个最大的数里面是否包括包含targets预测中的标签，一般取1，
        # 即取预测最大概率的索引与标签的对比
        top_k_op = tf.nn.in_top_k(eval_logits, eval_lables, 1)
        saver = tf.train.Saver()
        with tf.Session() as session:
            ckpt = tf.train.get_checkpoint_state('F:\\tensorflow-CIFAR10\\saver')
            if ckpt and ckpt.model_checkpoint_path:
                saver.restore(session, ckpt.model_checkpoint_path)
            coord = tf.train.Coordinator()
            threads = tf.train.start_queue_runners(sess=session, coord=coord)
            num_iter = int(n_test / BATCH_SIZE)
            true_count = 0
            for step in range(num_iter):
                predictions = session.run(top_k_op)
                true_count = true_count + np.sum(predictions)
            precision = true_count / (num_iter * BATCH_SIZE)
            print('precision=', precision)
            coord.request_stop()
            coord.join(threads)