使用LeNet-5模型实现MNIST手写数字识别，其神经网络架构如下：
这里写图片描述

一、详细介绍LeNet-5模型每一层的结构

第一层，卷积层

这一层输入原始的图像像素，接受的输入层大小为32*32*1，第一个卷积层过滤器尺寸为5*5,共6个，不使用全0填充，步长为1。输出尺寸为32-5+1=28，为28*28*6的矩阵。这一个卷积层共有5*5*1*6+6=156个参数（过滤器权重为5*5*1*6，偏置项为6个），下一层共有28*28*6=4704个节点，每个节点和当前5*5=25个节点相连，所以本层卷积层共有4704*（25+1）=122304个链接。

第二层，池化层

这一层的输入为第二层的输出，是一个28*28*6的节点矩阵，本层采用的过滤器大小为2*2，步长为2*2，输出矩阵的大小为（28-2+1）/2=14(向上取整), 为14*14*6矩阵。本书使用的过滤器有细微的区别，不做具体介绍。

第三层，卷积层

输入矩阵14*14*6，过滤器为5*5,16个，不适用全0填充，步长为1。输出矩阵为10*10*16

第四层，池化层

输入矩阵为10*10*16，采用2*2过滤器，步长为2，输出矩阵为5*5*16

第五层，全连接层

输入矩阵5*5*16，在LeNet-5模型的论文中将这一层称为卷积层，但是采用的过滤器是5*5，覆盖了整个矩阵的平面，和全连接没有区别，所以在这里将其看作全连接层。输出节点个数为120，总共有5*5*16*120+120=48120个参数、

第六层，全连接层

输入节点120个，输出节点84个，总参数为120*84+84=10164个

第七层，全连接层

输入节点84个，输出节点10个，总参数为84*10+10=850个

二、接下来用TensorFlow实现该网络

1、导入模块和数据

import tensorflow as tf
import numpy as np
import os
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('../datasets/MNIST_data/', one_hot=True)

2、配置神经网络结构相关的参数

INPUT_NODE = 784  # 输入层节点数，图片是28*28*1的格式，每个像素点对应一个节点就是784
OUTPUT_NODE = 10  # 输出层节点数，0-9十个数字

IMAGE_SIZE = 28
NUM_CHANNELS = 1
NUM_LABELS = 10 


# 第一层卷积神经网络配置
CONV1_DEEP = 32
CONV1_SIZE = 5
# 第二层卷积神经网络配置
CONV2_DEEP = 64
CONV2_SIZE = 5
# 全连接层的节点个数
FC_SIZE = 512

BATCH_SIZE = 100  # Batch的大小
LEARNING_RATE_BASE = 0.01  #基础学习率
LEARNING_RATE_DECAY = 0.99  # 学习率衰减速率
REGULARIZATION_RATE = 0.0001  # L2正则化参数
TRAINING_STEPS = 6000  # 训练轮数
MOVING_AVERAGE_DECAY = 0.99  # 滑动平均衰减率

3、定义前向传播的过程

def inference(x, train, regularizer):
    # 第一层，卷积层
    with tf.variable_scope('layer1-conv1'):
        conv1_weights = tf.get_variable(
        "weight", [CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP],
        initializer=tf.truncated_normal_initializer(stddev=0.1))
        conv1_biases = tf.get_variable("bias", [CONV1_DEEP], 
                                       initializer=tf.constant_initializer(0.0))
        conv1 = tf.nn.conv2d(x, conv1_weights, strides=[1, 1, 1, 1], padding="SAME")
        # 具有 [stride_batch_size, stride_in_height, stride_in_width, stride_in_channels] 
        # 这样的 shape，第一个元素代表在一个样本的特征图上移动，第二三个元素代表在特征图
        # 上的高、宽上移动，第四个元素代表在通道上移动。
        relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))

    # 第二层，池化层
    with tf.variable_scope('layer2-pool1'):
        pool1 = tf.nn.max_pool(relu1, ksize=[1,2,2,1], strides=[1,2,2,1],padding="SAME")
        # ksize，池化窗口的大小，取一个四维向量，一般是 [batch_size, height, width, 
        # channels]，因为我们不想在 batch 和 channels 上做池化，所以这两个维度设为了1。

    # 第三层，卷积层
    with tf.variable_scope('layer3-conv2'):
        conv2_weights = tf.get_variable(
            "weight", [CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP],
            initializer=tf.truncated_normal_initializer(stddev=0.1))
        conv2_biases = tf.get_variable("bias", [CONV2_DEEP],
                                      initializer=tf.constant_initializer(0.0))
        conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1,1,1,1], padding="SAME")
        relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))

    # 第四层，池化层
    with tf.variable_scope('layer4-pool2'):
        pool2 = tf.nn.max_pool(relu2, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
        # 变换矩阵的维度，方便和第五层的全连接层相连
#         pool_shape = pool2.get_shape().as_list()
#         nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
#         reshape = tf.reshape(pool2, [pool_shape[0], nodes])
    with tf.variable_scope('flatten'):
        fla = tf.contrib.layers.flatten(pool2)
        nodes = fla.shape[1]

    # 第五层，全连接层
    with tf.variable_scope('layer5-fc1'):
        fc1_weights = tf.get_variable("weight", [nodes, FC_SIZE],
                                     initializer=tf.truncated_normal_initializer(stddev=0.1))
        if regularizer != None:
            tf.add_to_collection('losses', regularizer(fc1_weights))

        fc1_biases = tf.get_variable('bias', [FC_SIZE], initializer=tf.constant_initializer(0.1))
        fc1 = tf.nn.relu(tf.matmul(fla, fc1_weights) + fc1_biases)
        if train:
            fc1 = tf.nn.dropout(fc1, 0.5)
    # 第六岑，输出层
    with tf.variable_scope('layer6-fc2'):
        fc2_weights = tf.get_variable("weight", [FC_SIZE, NUM_LABELS],
                                      initializer=tf.truncated_normal_initializer(stddev=0.1))
        if regularizer != None: tf.add_to_collection('losses', regularizer(fc2_weights))
        fc2_biases = tf.get_variable("bias", [NUM_LABELS], initializer=tf.constant_initializer(0.1))
        logit = tf.matmul(fc1, fc2_weights) + fc2_biases


    return logit

4、定义向后传播和训练过程（包括损失函数的计算，最终预测


  def train(mnist):
    """训练模型"""
    x = tf.placeholder(tf.float32, shape= [None, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS], name="x-input")
    y_ = tf.placeholder(tf.float32, shape=[None, OUTPUT_NODE], name="y-input")

    # 定义正则化的方法
    regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
    # 向前传播求出y
    y = inference(x, False, regularizer)
    # 定义训练的轮数，需要用trainable=False参数指定不训练这个变量，
    # 这样同时也可以避免这个变量被计算滑动平均值
    global_step = tf.Variable(0, trainable=False)

    # 给定滑动平均衰减速率和训练轮数，初始化滑动平均类
    # 定训练轮数的变量可以加快训练前期的迭代速度
    variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,
                                                          global_step)
    # 用tf.trainable_variable()获取所有可以训练的变量列表，全部使用滑动平均
    variables_averages_op = variable_averages.apply(tf.trainable_variables())

    # 定义损失函数
    # 因为标准答案是一个长度为10的一维数组，argmax可以从这个矩阵（y_）的轴为1的部分取最大值的序号
    # 注意前面已经热点化答案了，所以最大值为1，其他值为0
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,
                                                                   labels=tf.argmax(y_, 1))
    # 获取总损失平均值
    cross_entropy_mean = tf.reduce_mean(cross_entropy)

    # 给损失加上正则化的损失
    # 使用get_collection获取losses集合的全部值的列表，然后用add_n求列表的所有值的和
    loss = cross_entropy_mean + tf.add_n(tf.get_collection("losses"))

    # 求加上指数衰减的学习率
    learning_rate = tf.train.exponential_decay(
        LEARNING_RATE_BASE,
        global_step,
        mnist.train.num_examples / BATCH_SIZE,
        LEARNING_RATE_DECAY,
        staircase = True
    )

    # 优化损失函数
    # global_step初始值为0，在loss更新后会+1，用来记录更新的次数
    # 返回值是训练之后的梯度，会随着global_step递增
    train_step = tf.train.GradientDescentOptimizer(
        learning_rate).minimize(loss, global_step=global_step)

    # 反向传播更新参数之后需要更新每一个参数的滑动平均值，用下面的代码可以一次完成这两个操作
    with tf.control_dependencies([train_step, variables_averages_op]):
        train_op = tf.no_op(name="train")

    # y是计算得出的预测答案，而y_是正确答案，用argmax获取答案的序号（也即是数字的值）
    # equal()判断两个答案是否相等，是就返回True，否就返回False
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    # cast()把一个布尔类型的数转换为实数，然后用reduce_mean计算平均值，获取准确率
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # 开启会话，计算
    with tf.Session() as sess:
        # 初始化全局变量
        tf.global_variables_initializer().run()
        for i in range(TRAINING_STEPS):
            # tensorflow的数据集特有的一种batch_size获取方法
            xs, ys = mnist.train.next_batch(BATCH_SIZE)
            xs = np.reshape(xs, (BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))
            _, loss_value, step = sess.run([train_op, loss, global_step],
                                          feed_dict={x: xs, y_: ys})
            if i % 1000 == 0:
                print("After %d training step(s), loss on training batch is %g" % (step, loss_value))
                test_x, test_y = mnist.test.next_batch(1000)
                test_x = test_x.reshape(-1, 28, 28, 1)
                train_acc = accuracy.eval(feed_dict={x: test_x, y_: test_y})
                print(train_acc)

5、运行程序

def main(argv=None):
    tf.reset_default_graph()
    train(mnist)


if __name__ == "__main__":
    main()

运行结果：

每个人的可能不一样，因为没有设置固定化随机值

After 1 training step(s), loss on training batch is 3.35754
0.191
After 1001 training step(s), loss on training batch is 0.802243
0.963
After 2001 training step(s), loss on training batch is 0.751919
0.975
After 3001 training step(s), loss on training batch is 0.676599
0.978
After 4001 training step(s), loss on training batch is 0.723398
0.975
After 5001 training step(s), loss on training batch is 0.633572
0.979

Tensorflow 实战Google深度学习框架——学习笔记（六）LeNet-5网络实现MNIST手写数字集识别