MNIST数据集实现手写数字识别(基于tensorflow)

主要应用了下面几个方法来提高准确率;

  • 使用随机梯度下降(batch)
  • 使用Relu激活函数去线性化
  • 使用正则化避免过拟合
  • 使用带指数衰减的学习率
  • 使用滑动平均模型
  • 使用交叉熵损失函数来刻画预测值和真实值之间的差距的损失函数

第一步,导入MNIST数据集

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

第二步:设置输入和输出节点的个数,配置神经网络的参数

# MNIST 数据集相关的常数
INPUT_NODE = 784  # 输入层为28*28的像素
OUTPUT_NODE = 10  # 输出层0~9有10类
 
# 配置神经网络的参数 这里是用了三层的神经网络,只包含一层隐藏层
LAYER1_NODE = 800  # 隐藏层节点数
 
BATCH_SIZE = 128  # batch的大小 每次batch打包的样本个数,个数越小训练过程越接近随机梯度下降,数字越大,训练越接近梯度下降
 
LEARNING_RATE_BASE = 0.9  # 基础的学习率
LEARNING_RATE_DECAY = 0.99  # 学习率的衰减率
REGULARIZATION_RATE = 0.0001  # 描述模型复杂度的正则化在损失函数中的系数
TRAINING_STEPS = 20000  # 训练轮数
MOVING_AVERAGE_DECAY = 0.99  # 滑动平均的衰减率

第三步:定义辅助函数来计算前向传播结果,使用ReLU做为激活函数

def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2):
    # 当没有提供滑动平均类时,直接使用参数当前的值
    if avg_class == None:
        # 计算隐藏层的前向传播结果,ReLU激活函数
        layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)
        # 计算输出层的前向传播结果(计算损失函数时,会一并进行softmax运输,在这里不进行softmax回归)
        return tf.matmul(layer1, weights2) + biases2
    else:
        # 需要先使用滑动平均值计算出参数  首先使用avg_class.average函数来计算得出变量的滑动平均值,然后再计算相应的前向传播结果
        layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1))
        return tf.matmul(layer1, avg_class.average(weights2)) + avg_class.average(biases2)

经过尝试,sigmoid激活函数没有ReLU激活函数效果好,但也相差不是特别大,测试结果见后面结果部分。 

第四步:定义训练过程:

# 定义训练模型的操作
def train(mnist):
    x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')
    y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y-input')
 
    # 生成隐藏层的参数
    weights1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))#正态分布,如果随机数偏离均值超过2个标准差,就重新随机
    biases1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))#生成常量
 
    # 生成输出层的参数
    weights2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, OUTPUT_NODE], stddev=0.1))
    biases2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))
 
    # 定义计算当前参数下,神经网络前向传播的结果。
    y = inference(x, None, weights1, biases1, weights2, biases2)
 
    # 定义存储训练轮数的变量 这个变量不需要计算滑动平均值,所以这里指定这个变量为不可训练的变量(trainable=false),在tensorflow训练神经网络中一般会将代表训练轮数的变量指定为不可训练的参数
    global_step = tf.Variable(0, trainable=False)
 
    # 给定滑动平均衰减率和训练轮数的变量,初始化滑动平均类。这里知道给定训练轮数的变量可以加快训练早期变量的更新速度。
    variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
 
    # 定义滑动平均的操作 在所有代表神经网络参数的变量上使用滑动平均,而其他的辅助变量不需要。tf.trainable_variables()返回的就是图上集合GraphKes.TRAINABLE_VARIABLES中的元素,这个集合的元素就是所有没有指定trainable=false的参数
    variable_averages_op = variable_averages.apply(tf.trainable_variables())
 
    # 定义计算使用了滑动平均之后的前向传播结果
    average_y = inference(x, variable_averages, weights1, biases1, weights2, biases2)
 
    # 计算交叉熵及其平均值:其中交叉熵作为刻画预测值和真实值之间差距的损失函数。这里使用了tensorflow提供的tf.nn.sparse_softmax_cross_entropy_with_logits来计算交叉熵。当分类问题只有一个正确答案时,可以使用该函数加速计算。第一个参数是神经网络不包括softmax层的前向传播结果,第二个是给定的训练数据的正确答案。因为标准lable是一个长度为10的一位数组,而函数argmax得到的是相应标签对应的类别编号
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=y)
    cross_entropy_mean = tf.reduce_mean(cross_entropy)#计算在当前batch中 所有样例的交叉熵平均值
    # 计算L2正则化损失函数
    regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
    # 计算模型的正则化损失函数。一般只计算神经网络边上权重的正则化损失,而不使用偏置项
    regularization = regularizer(weights1) + regularizer(weights2)
    # 总损失等于交叉熵损失和正则化损失的和
    loss = cross_entropy_mean + regularization
 
    # 学习率
    learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,# 基础的学习率,随着迭代的进行,更新变量时使用的学习率在这个基础上递减
                                               global_step, # 当前迭代的轮数,初始值为0
                                               mnist.train.num_examples / BATCH_SIZE,# 跑完所有的训练数据需要的迭代次数
                                               LEARNING_RATE_DECAY, # 学习率衰减速度
                                               staircase=True# 决定衰减学习率的曲线图是何种形式,这里是阶梯衰减)
 
    # 优化算法 这里使用tf.train.GradientDescentOptimizer优化算法来优化损失函数,这里的损失函数包括了交叉熵函数和L2正则化损失
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 
    #  反向传播更新参数和更新每一个参数的滑动平均值。在训练神经网络时,每过一遍数据既需要通过反向传播更新神经网络的参数,又需要更新每一个参数的滑动平均值,为了一次完成多个操作,tensorflow提供了 tf.control_dependencies和tf.group两种机制。
    with tf.control_dependencies([train_step, variable_averages_op]):
        train_op = tf.no_op(name='train')
 
    # 检验准确度 检查使用了滑动平均模型的神经网络前向传播结果是否正确:tf.argmax(average_y, 1)计算每一个样例的预测答案。其中average_y是一个batch*10的二维数组,每一行表示一个样例的前向传播结果。第二个参数1表示选取最大值的操作仅在第一个维度中进行(也就是说只在每一行中选取最大值的下标)。于是得到的结果是一个长度为batch的一维数组,这个一维数组中的值就表示了每一个样例对应的数字识别结果。tf.equal判断两个张量的每一维是否相等,如果相等则返回TRUE,否则返回False
    correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))#判断预测结果和真实结果是否相同
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))#先将布尔值转换为实数型,然后再计算平均值。这个平均值就是模型在这一维数据上的正确率
 
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
 
        # 准备数据
        validate_feed = {x: mnist.validation.images, y_: mnist.validation.labels}#验证数据
        test_feed = {x: mnist.test.images, y_: mnist.test.labels}#测试数据
 
        # 迭代的训练神经网络
        for i in range(TRAINING_STEPS):
            # 每1000轮,使用验证数据测试一次
            if i % 1000 == 0:
                validate_acc = sess.run(accuracy, feed_dict=validate_feed)
                print("After %d training step(s), validation accuracy "
                      "using average model is %g " % (i, validate_acc))
 
            # 训练的数据
            xs, ys = mnist.train.next_batch(BATCH_SIZE)
            sess.run(train_op, feed_dict={x: xs, y_: ys})
 
        # 测试最终的准确率
        test_acc = sess.run(accuracy, feed_dict=test_feed)
        print("After %d training step(s), test accuracy using average model is %g " % (TRAINING_STEPS, test_acc))
        exit

第六步:主程序

# 主程序入口
def main(argv=None):
    mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
    train(mnist)
 
 
if __name__ == '__main__':
    tf.app.run()

使用ReLU激活函数的结果:

After 0 training step(s), validation accuracy using average model is 0.091 
After 1000 training step(s), validation accuracy using average model is 0.9792 
After 2000 training step(s), validation accuracy using average model is 0.9822 
After 3000 training step(s), validation accuracy using average model is 0.9836 
After 4000 training step(s), validation accuracy using average model is 0.9826 
After 5000 training step(s), validation accuracy using average model is 0.9832 
After 6000 training step(s), validation accuracy using average model is 0.983 
After 7000 training step(s), validation accuracy using average model is 0.984 
After 8000 training step(s), validation accuracy using average model is 0.9828 
After 9000 training step(s), validation accuracy using average model is 0.9838 
After 10000 training step(s), validation accuracy using average model is 0.9834 
After 11000 training step(s), validation accuracy using average model is 0.9842 
After 12000 training step(s), validation accuracy using average model is 0.984 
After 13000 training step(s), validation accuracy using average model is 0.9834 
After 14000 training step(s), validation accuracy using average model is 0.9836 
After 15000 training step(s), validation accuracy using average model is 0.9842 
After 16000 training step(s), validation accuracy using average model is 0.9838 
After 17000 training step(s), validation accuracy using average model is 0.984 
After 18000 training step(s), validation accuracy using average model is 0.9854 
After 19000 training step(s), validation accuracy using average model is 0.9846 
After 20000 training step(s), test accuracy using average model is 0.9844

使用sigmoid激活函数的结果;

After 0 training step(s), validation accuracy using average model is 0.1202 
After 1000 training step(s), validation accuracy using average model is 0.9474 
After 2000 training step(s), validation accuracy using average model is 0.9634 
After 3000 training step(s), validation accuracy using average model is 0.9702 
After 4000 training step(s), validation accuracy using average model is 0.9726 
After 5000 training step(s), validation accuracy using average model is 0.9748 
After 6000 training step(s), validation accuracy using average model is 0.9766 
After 7000 training step(s), validation accuracy using average model is 0.977 
After 8000 training step(s), validation accuracy using average model is 0.978 
After 9000 training step(s), validation accuracy using average model is 0.9788 
After 10000 training step(s), validation accuracy using average model is 0.9782 
After 11000 training step(s), validation accuracy using average model is 0.979 
After 12000 training step(s), validation accuracy using average model is 0.9792 
After 13000 training step(s), validation accuracy using average model is 0.9792 
After 14000 training step(s), validation accuracy using average model is 0.9798 
After 15000 training step(s), validation accuracy using average model is 0.9796 
After 16000 training step(s), validation accuracy using average model is 0.9798 
After 17000 training step(s), validation accuracy using average model is 0.9796 
After 18000 training step(s), validation accuracy using average model is 0.9802 
After 19000 training step(s), validation accuracy using average model is 0.9806 
After 20000 training step(s), test accuracy using average model is 0.981 

上面是参考《Tensorflow实战Google深度学习框架》得到的程序。下面是搭建的一个简易的神经网络(我好像没加隐藏层),正确率较低:

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

#导入数据
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

#模型的输入和输出
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])

# 模型的权重和偏移量
W = tf.Variable(tf.truncated_normal([784, 10], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[10]))

# 创建Session
sess = tf.InteractiveSession()
# 初始化权重变量
sess.run(tf.global_variables_initializer())

y = tf.nn.softmax(tf.matmul(x, W) + b)

# 交叉熵
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

#训练
train = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
for i in range(1000):
    batch = mnist.train.next_batch(100)#读取一部分作为训练数据
    train.run(feed_dict={x: batch[0], y_: batch[1]})#进行训练

#测试
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_:mnist.test.labels}))#在测试数据集上得到正确率
runfile('C:/Users/zlj/.spyder-py3/temp.py', wdir='C:/Users/zlj/.spyder-py3')
Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
0.9183
发布了105 篇原创文章 · 获赞 53 · 访问量 6万+

猜你喜欢

转载自blog.csdn.net/zhao2018/article/details/83449019