Resist overfitting

Resist overfitting

  • Increase the data set

  • Early stopping

When the best accuracy is not reached for 10 consecutive epochs, the iteration ends early

  • Dropout

Dropout: During training, some neurons are not involved in training, and all neurons are involved in testing during testing.
Conclusion: Dropout can resist over-fitting, but it is not suitable for all situations. It is generally used when the neural network is more complex, and the effect will be better.
This program uses cross entropy as the loss function, and the running result will be slower (MSE is generally used for regression Problem, cross entropy is generally used for classification problems)

Dropout needs to predefine a placeholder of keep_prob, which is used to pass parameters in the session and determine the proportion of neurons to be used for training. The setting 0.5 here means that 50% of the neurons are used for training.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.compat.v1.disable_eager_execution()
# 载入数据集,one_hot = True 采用独热编码,即 1-> 0100000000 ,5-> 0000010000
mnist = input_data.read_data_sets("mnist_data",one_hot = True)

# 批次大小,每次训练放入64个数据,批次大小通常为16,32,64
batch_size = 64
# 计算一共有多少个批次
n_batch = mnist.train.num_examples // batch_size

# 创建一个神经网络784-1000-500-10
# 定义3个placeholder
x = tf.compat.v1.placeholder(tf.float32,[None,784])
y = tf.compat.v1.placeholder(tf.float32,[None,10])
keep_prob = tf.compat.v1.placeholder(tf.float32)

# 784-1000-500-10
W1 = tf.Variable(tf.random.truncated_normal([784,1000],stddev = 0.1))
b1 = tf.Variable(tf.zeros([1000]) + 0.1)
L1 = tf.nn.tanh(tf.matmul(x,W1) + b1)
L1_drop = tf.compat.v1.nn.dropout(L1,keep_prob)

W2 = tf.Variable(tf.random.truncated_normal([1000,500],stddev = 0.1))
b2 = tf.Variable(tf.zeros([500]) + 0.1)
L2 = tf.nn.tanh(tf.matmul(L1_drop,W2) + b2)
L2_drop = tf.compat.v1.nn.dropout(L2,keep_prob)

W3 = tf.Variable(tf.random.truncated_normal([500,10],stddev = 0.1))
b3 = tf.Variable(tf.zeros([10]) + 0.1)
prediction = tf.nn.softmax(tf.matmul(L2_drop,W3) + b3)

# 交叉熵代价函数
loss = tf.compat.v1.losses.softmax_cross_entropy(y,prediction)
# loss = tf.compat.v1.losses.mean_squared_error(y,prediction)

# 训练
train = tf.compat.v1.train.GradientDescentOptimizer(0.5).minimize(loss)

# correct_prediction 得到一个布尔型的结果 tf.argmax(y,1)是返回每一行最大索引值,1换成0就是每一列最大索引值
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
# 准确率计算
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.compat.v1.Session() as sess:
    sess.run(tf.compat.v1.global_variables_initializer())
    # 所有数据训练一次是一个周期
    for epoch in range(31):
        for batch in range(n_batch):
            #获取一个批次的数据或标签
            batch_xs,batch_ys = mnist.train.next_batch(batch_size)        
            # 50%的神经元工作,50%的神经元不工作
            sess.run(train,feed_dict = {
    
    x:batch_xs,y:batch_ys,keep_prob:0.5})
        test_acc = sess.run(accuracy,feed_dict = {
    
    x:mnist.test.images,y:mnist.test.labels,keep_prob:1.0})
        train_acc = sess.run(accuracy,feed_dict = {
    
    x:mnist.train.images,y:mnist.train.labels,keep_prob:1.0})
        print("Iter" + str(epoch) + ".Testing Accuracy" + str(test_acc) + ".Training Accuracy" + str(train_acc))            

运行结果
Extracting mnist_data\train-images-idx3-ubyte.gz
Extracting mnist_data\train-labels-idx1-ubyte.gz
Extracting mnist_data\t10k-images-idx3-ubyte.gz
Extracting mnist_data\t10k-labels-idx1-ubyte.gz
Iter0.Testing Accuracy0.9198.Training Accuracy0.9161818
Iter1.Testing Accuracy0.9278.Training Accuracy0.9264909
Iter2.Testing Accuracy0.9345.Training Accuracy0.9334
Iter3.Testing Accuracy0.9388.Training Accuracy0.93883634
Iter4.Testing Accuracy0.9437.Training Accuracy0.94272727
Iter5.Testing Accuracy0.9458.Training Accuracy0.9455091
Iter6.Testing Accuracy0.9505.Training Accuracy0.9504182
Iter7.Testing Accuracy0.9497.Training Accuracy0.9513818
Iter8.Testing Accuracy0.9537.Training Accuracy0.95476365
Iter9.Testing Accuracy0.9519.Training Accuracy0.9536545
Iter10.Testing Accuracy0.9528.Training Accuracy0.9556909
Iter11.Testing Accuracy0.9559.Training Accuracy0.959
Iter12.Testing Accuracy0.9575.Training Accuracy0.9602
Iter13.Testing Accuracy0.9575.Training Accuracy0.9602727
Iter14.Testing Accuracy0.9594.Training Accuracy0.9617818
Iter15.Testing Accuracy0.9603.Training Accuracy0.96403635
Iter16.Testing Accuracy0.9601.Training Accuracy0.96425456
Iter17.Testing Accuracy0.9608.Training Accuracy0.9647091
Iter18.Testing Accuracy0.9624.Training Accuracy0.9662
Iter19.Testing Accuracy0.9619.Training Accuracy0.96703637
Iter20.Testing Accuracy0.9642.Training Accuracy0.9677455
Iter21.Testing Accuracy0.9644.Training Accuracy0.9684909
Iter22.Testing Accuracy0.9642.Training Accuracy0.9689818
Iter23.Testing Accuracy0.9648.Training Accuracy0.9697091
Iter24.Testing Accuracy0.9647.Training Accuracy0.9696364
Iter25.Testing Accuracy0.9658.Training Accuracy0.9701091
Iter26.Testing Accuracy0.9654.Training Accuracy0.9706182
Iter27.Testing Accuracy0.9662.Training Accuracy0.97165453
Iter28.Testing Accuracy0.9658.Training Accuracy0.9710182
Iter29.Testing Accuracy0.9659.Training Accuracy0.97243637
Iter30.Testing Accuracy0.9687.Training Accuracy0.97243637

  • Regularization

Not all neural networks are suitable for regularization. It is more suitable for complex network structures and has the effect of resisting over-fitting. The regularization processing is mainly 2 lines of code in the loss function part

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.compat.v1.disable_eager_execution()
# 载入数据集,one_hot = True 采用独热编码,即 1-> 0100000000 ,5-> 0000010000
mnist = input_data.read_data_sets("mnist_data",one_hot = True)

# 批次大小,每次训练放入64个数据,批次大小通常为16,32,64
batch_size = 64
# 计算一共有多少个批次
n_batch = mnist.train.num_examples // batch_size

# 创建一个神经网络784-1000-500-10
# 定义3个placeholder
x = tf.compat.v1.placeholder(tf.float32,[None,784])
y = tf.compat.v1.placeholder(tf.float32,[None,10])
keep_prob = tf.compat.v1.placeholder(tf.float32)

# 784-1000-500-10
W1 = tf.Variable(tf.random.truncated_normal([784,1000],stddev = 0.1))
b1 = tf.Variable(tf.zeros([1000]) + 0.1)
L1 = tf.nn.tanh(tf.matmul(x,W1) + b1)
L1_drop = tf.compat.v1.nn.dropout(L1,keep_prob)

W2 = tf.Variable(tf.random.truncated_normal([1000,500],stddev = 0.1))
b2 = tf.Variable(tf.zeros([500]) + 0.1)
L2 = tf.nn.tanh(tf.matmul(L1_drop,W2) + b2)
L2_drop = tf.compat.v1.nn.dropout(L2,keep_prob)

W3 = tf.Variable(tf.random.truncated_normal([500,10],stddev = 0.1))
b3 = tf.Variable(tf.zeros([10]) + 0.1)
prediction = tf.nn.softmax(tf.matmul(L2_drop,W3) + b3)

# 正则项 l1正则化方式就改成l1就行
l2_loss = tf.nn.l2_loss(W1) + tf.nn.l2_loss(b1) + tf.nn.l2_loss(W2) + tf.nn.l2_loss(b2) + tf.nn.l2_loss(W3) + tf.nn.l2_loss(b3)
# 带有l2_loss的交叉熵代价函数
loss = tf.compat.v1.losses.softmax_cross_entropy(y,prediction) + 0.0005*l2_loss


# 训练
train = tf.compat.v1.train.GradientDescentOptimizer(0.5).minimize(loss)

# correct_prediction 得到一个布尔型的结果 tf.argmax(y,1)是返回每一行最大索引值,1换成0就是每一列最大索引值
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
# 准确率计算
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.compat.v1.Session() as sess:
    sess.run(tf.compat.v1.global_variables_initializer())
    # 所有数据训练一次是一个周期
    for epoch in range(31):
        for batch in range(n_batch):
            #获取一个批次的数据或标签
            batch_xs,batch_ys = mnist.train.next_batch(batch_size)        
            # 50%的神经元工作,50%的神经元不工作
            sess.run(train,feed_dict = {
    
    x:batch_xs,y:batch_ys,keep_prob:0.5})
        test_acc = sess.run(accuracy,feed_dict = {
    
    x:mnist.test.images,y:mnist.test.labels,keep_prob:1.0})
        train_acc = sess.run(accuracy,feed_dict = {
    
    x:mnist.train.images,y:mnist.train.labels,keep_prob:1.0})
        print("Iter" + str(epoch) + ".Testing Accuracy" + str(test_acc) + ".Training Accuracy" + str(train_acc))            


运行结果
Extracting mnist_data\train-images-idx3-ubyte.gz
Extracting mnist_data\train-labels-idx1-ubyte.gz
Extracting mnist_data\t10k-images-idx3-ubyte.gz
Extracting mnist_data\t10k-labels-idx1-ubyte.gz
Iter0.Testing Accuracy0.9202.Training Accuracy0.9148
Iter1.Testing Accuracy0.9241.Training Accuracy0.92285454
Iter2.Testing Accuracy0.9305.Training Accuracy0.93072724
Iter3.Testing Accuracy0.9349.Training Accuracy0.93430907
Iter4.Testing Accuracy0.9335.Training Accuracy0.9341273
Iter5.Testing Accuracy0.9377.Training Accuracy0.9386727
Iter6.Testing Accuracy0.9407.Training Accuracy0.9416364
Iter7.Testing Accuracy0.9245.Training Accuracy0.9260727
Iter8.Testing Accuracy0.9321.Training Accuracy0.93332726
Iter9.Testing Accuracy0.9448.Training Accuracy0.9448
Iter10.Testing Accuracy0.9354.Training Accuracy0.9371273
Iter11.Testing Accuracy0.9412.Training Accuracy0.9432727
Iter12.Testing Accuracy0.9414.Training Accuracy0.9434182
Iter13.Testing Accuracy0.9438.Training Accuracy0.9452364
Iter14.Testing Accuracy0.9474.Training Accuracy0.94765455
Iter15.Testing Accuracy0.9462.Training Accuracy0.9472
Iter16.Testing Accuracy0.9469.Training Accuracy0.94698185
Iter17.Testing Accuracy0.945.Training Accuracy0.9432909
Iter18.Testing Accuracy0.9423.Training Accuracy0.9446909
Iter19.Testing Accuracy0.9463.Training Accuracy0.94690907
Iter20.Testing Accuracy0.9417.Training Accuracy0.9429455
Iter21.Testing Accuracy0.9438.Training Accuracy0.9458727
Iter22.Testing Accuracy0.9417.Training Accuracy0.9422
Iter23.Testing Accuracy0.9461.Training Accuracy0.9457091
Iter24.Testing Accuracy0.9483.Training Accuracy0.94796365
Iter25.Testing Accuracy0.9459.Training Accuracy0.9464
Iter26.Testing Accuracy0.9473.Training Accuracy0.9472727
Iter27.Testing Accuracy0.9452.Training Accuracy0.94732726
Iter28.Testing Accuracy0.9483.Training Accuracy0.94914544
Iter29.Testing Accuracy0.9425.Training Accuracy0.9444
Iter30.Testing Accuracy0.9471.Training Accuracy0.9500545

Guess you like

Origin blog.csdn.net/weixin_44823313/article/details/112527875