Tensorflow基础(五)--如何防止过拟合,Dropout的使用

1.拟合

回归问题拟合可能情况:
回归问题
分类问题拟合可能情况:
分类问题
过拟合是能把训练样本很好甚至百分之百拟合,但是如果来了一批新样本,他的准确率又会非常低。正确拟合应该是在训练样本和新样本中都有一致且较好的准确率。

2.防止过拟合的方法

过拟合一般是数据集太小,神经网络又太复杂导致的。就比如我们解方程的时候,已知情况少,而未知变量过多,这样的话就求不出应有的解。
为了防止过拟合一般有如下三种方法解决:
在这里插入图片描述

3.演示代码

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

#载入数据集
mnist = input_data.read_data_sets("MNIST_data",one_hot=True)

#每个批次的大小
batch_size = 100
#计算一共有多少个批次
n_batch = mnist.train.num_examples // batch_size

#定义两个placeholder
x = tf.placeholder(tf.float32,[None,784])
y = tf.placeholder(tf.float32,[None,10])
keep_prob = tf.placeholder(tf.float32)

#故意创建一个复杂的神经网络,试试用Dropout进行调整对比
W1 = tf.Variable(tf.truncated_normal([784,2000],stddev=0.1))
b1 = tf.Variable(tf.zeros([2000])+0.1)
L1 = tf.nn.tanh(tf.matmul(x,W1)+b1)
L1_drop = tf.nn.dropout(L1,keep_prob)

W2 = tf.Variable(tf.truncated_normal([2000,2000],stddev=0.1))
b2 = tf.Variable(tf.zeros([2000])+0.1)
L2 = tf.nn.tanh(tf.matmul(L1_drop,W2)+b2)
L2_drop = tf.nn.dropout(L2,keep_prob)

W3 = tf.Variable(tf.truncated_normal([2000,1000],stddev=0.1))
b3 = tf.Variable(tf.zeros([1000])+0.1)
L3 = tf.nn.tanh(tf.matmul(L2_drop,W3)+b3)
L3_drop = tf.nn.dropout(L3,keep_prob)

W4 = tf.Variable(tf.truncated_normal([1000,10],stddev=0.1))
b4 = tf.Variable(tf.zeros([10])+0.1)
prediction = tf.nn.softmax(tf.matmul(L3_drop,W4) + b4)

#二次代价函数
# loss = tf.reduce_mean(tf.square(y-prediction))
#使用softmax交叉熵代价函数
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#使用梯度下降法进行训练
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

#初始化变量
init = tf.global_variables_initializer()

#结果存放在一个布尔型列表中
#argmax返回一维张量中最大值所在的位置
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
#求准确率
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(31):
        for batch in range(n_batch):
            batch_xs,batch_ys = mnist.train.next_batch(batch_size)
            sess.run(train_step,feed_dict = {x:batch_xs,y:batch_ys,keep_prob:0.7})
        
        acc_test = sess.run(accuracy,feed_dict = {x:mnist.test.images,y:mnist.test.labels,keep_prob:1.0})
        acc_train = sess.run(accuracy,feed_dict = {x:mnist.train.images,y:mnist.train.labels,keep_prob:1.0})
        print("Iter"+str(epoch)+",Testing Accuracy"+str(acc_test)+",Training Accuracy"+str(acc_train))

训练时keep_prob传入1.0运行结果:

#Dropout=1.0
Iter0,Testing Accuracy0.9501,Training Accuracy0.9604727
Iter1,Testing Accuracy0.9582,Training Accuracy0.97545457
Iter2,Testing Accuracy0.964,Training Accuracy0.98285455
Iter3,Testing Accuracy0.9653,Training Accuracy0.9866
Iter4,Testing Accuracy0.9676,Training Accuracy0.9883818
Iter5,Testing Accuracy0.9684,Training Accuracy0.9896909
Iter6,Testing Accuracy0.9698,Training Accuracy0.9906
Iter7,Testing Accuracy0.9699,Training Accuracy0.99127275
Iter8,Testing Accuracy0.9706,Training Accuracy0.9918
Iter9,Testing Accuracy0.97,Training Accuracy0.9923273
Iter10,Testing Accuracy0.9702,Training Accuracy0.9927818
Iter11,Testing Accuracy0.9699,Training Accuracy0.9931091
Iter12,Testing Accuracy0.9707,Training Accuracy0.99334544
Iter13,Testing Accuracy0.9705,Training Accuracy0.99365455
Iter14,Testing Accuracy0.9714,Training Accuracy0.9938545
Iter15,Testing Accuracy0.971,Training Accuracy0.9940182
Iter16,Testing Accuracy0.9708,Training Accuracy0.9942182
Iter17,Testing Accuracy0.9712,Training Accuracy0.99432725
Iter18,Testing Accuracy0.9708,Training Accuracy0.9944909
Iter19,Testing Accuracy0.9711,Training Accuracy0.9946
Iter20,Testing Accuracy0.9716,Training Accuracy0.99472725
Iter21,Testing Accuracy0.9714,Training Accuracy0.9948364
Iter22,Testing Accuracy0.9716,Training Accuracy0.99485457
Iter23,Testing Accuracy0.9718,Training Accuracy0.99496365
Iter24,Testing Accuracy0.9719,Training Accuracy0.99505454
Iter25,Testing Accuracy0.9714,Training Accuracy0.9951636
Iter26,Testing Accuracy0.9714,Training Accuracy0.9952545
Iter27,Testing Accuracy0.9717,Training Accuracy0.9953091
Iter28,Testing Accuracy0.9716,Training Accuracy0.99538183
Iter29,Testing Accuracy0.9713,Training Accuracy0.9954364
Iter30,Testing Accuracy0.9716,Training Accuracy0.9954727

训练时keep_prob传入0.7运行结果:

#Dropout=0.7
Iter0,Testing Accuracy0.9152,Training Accuracy0.91032726
Iter1,Testing Accuracy0.9309,Training Accuracy0.9278
Iter2,Testing Accuracy0.9376,Training Accuracy0.9334
Iter3,Testing Accuracy0.9399,Training Accuracy0.939
Iter4,Testing Accuracy0.9458,Training Accuracy0.9450182
Iter5,Testing Accuracy0.9455,Training Accuracy0.94734544
Iter6,Testing Accuracy0.9487,Training Accuracy0.9498364
Iter7,Testing Accuracy0.9522,Training Accuracy0.9538
Iter8,Testing Accuracy0.9533,Training Accuracy0.9561273
Iter9,Testing Accuracy0.9556,Training Accuracy0.9581091
Iter10,Testing Accuracy0.9564,Training Accuracy0.9590909
Iter11,Testing Accuracy0.9573,Training Accuracy0.9617091
Iter12,Testing Accuracy0.9588,Training Accuracy0.9626727
Iter13,Testing Accuracy0.9592,Training Accuracy0.96376365
Iter14,Testing Accuracy0.9623,Training Accuracy0.96532726
Iter15,Testing Accuracy0.9611,Training Accuracy0.9666182
Iter16,Testing Accuracy0.9629,Training Accuracy0.96805453
Iter17,Testing Accuracy0.9644,Training Accuracy0.9690727
Iter18,Testing Accuracy0.9651,Training Accuracy0.96985453
Iter19,Testing Accuracy0.9652,Training Accuracy0.97105455
Iter20,Testing Accuracy0.9661,Training Accuracy0.9717818
Iter21,Testing Accuracy0.9661,Training Accuracy0.9724182
Iter22,Testing Accuracy0.9661,Training Accuracy0.97276366
Iter23,Testing Accuracy0.9676,Training Accuracy0.97403634
Iter24,Testing Accuracy0.969,Training Accuracy0.9750182
Iter25,Testing Accuracy0.9699,Training Accuracy0.975
Iter26,Testing Accuracy0.9684,Training Accuracy0.97556365
Iter27,Testing Accuracy0.969,Training Accuracy0.97663635
Iter28,Testing Accuracy0.9699,Training Accuracy0.97694546
Iter29,Testing Accuracy0.9703,Training Accuracy0.97761816
Iter30,Testing Accuracy0.9706,Training Accuracy0.9779091

结果分析:
当keep_prob=1.0的时候,也就是训练的时候所有神经元都工作,也就相当于没有使用Dropout。当keep_prob=0.7的时候,每一次训练就有70%的神经元工作,而30%的神经元不工作。
从结果来看:
1.当keep_prob=1.0的时候,测试样本的准确率和训练样本的准确率发生了较大偏差,这就是过拟合,也就是说,训练样本的准确率较高,而如果用新样本进行测试,准确率将达不到训练样本的准确率。而当keep_prob=0.7的时候,测试样本的准确率和训练样本的准确率基本一致,所以使用Dropout能够防止过拟合。
2.当keep_prob=0.7的时候,准确率的收敛速度明显要比keep_prob=1.0的时候的慢。也就是说使用Dropout收敛速度会变慢。
3.两次结果显示最后测试样本的准确率都达到了0.97,那我们为什么还要使用Dropout,这是因为本例其实不是一个很好的例子。本例的网络模型还不够复杂,假如我们使用像GoogLeNet这样复杂的卷积神经网络来训练,而我们的数据集只有5万,相对来说数据集是不够的,如果不使用Dropout就很容发生过拟合,那么训练出来的模型,测试样本的准确率与训练样本的准确率就会相差很大,不仅仅像本例相差只有0.02。
当我们使用一个非常复杂的网络来训练一个非常小的数据集的时候才能更明显的看出Dropout的重要性。

发布了43 篇原创文章 · 获赞 13 · 访问量 4万+

猜你喜欢

转载自blog.csdn.net/star_of_science/article/details/104245506
今日推荐