Tensorflow study notes-overfitting

For the neural network over-fitting problem, there are generally the following solutions:
1. Introduce dropout, during training, let some neurons inactivate randomly, increase the network sparsity, which is conducive to feature selection and prevent over-fitting;
2. Introduce Batch normalizatin not only standardizes the input data of the input layer (minus the mean, except for standard deviation), but also standardizes the input data of the hidden layer to prevent overfitting;
3. Introduce l1_regularization, and do the loss function during training Some restrictions, when the weight parameters are updated, make many weight parameters equal to 0, which increases the sparsity of the network, which is beneficial to feature extraction and prevents overfitting;
4. Introduce l2_regularization, and make some restrictions on the loss function during training. When the weight parameter is updated, the absolute value of the weight parameter is continuously reduced to prevent overfitting;
for example, demonstrate how to use dropout:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# 获取数据集
# one_hot设置为True，将标签数据转化为0/1，如[1,0,0,0,0,0,0,0,0,0]
mnist=input_data.read_data_sets('MNIST_data',one_hot=True)

# 定义一个批次的大小
batch_size=100
n_batch=mnist.train.num_examples//batch_size

# 定义三个placeholder
# 行数值为None，None可以取任意数，本例中将取值100，即取决于pitch_size
# 列数值为784，因为输入图像尺寸已由28*28转换为1*784
x=tf.placeholder(tf.float32,[None,784])
y=tf.placeholder(tf.float32,[None,10])
keep_prob=tf.placeholder(tf.float32)

# 定义一个神经网络
# 权重初始值为0不是最优的，应该设置为满足截断正态分布的随机数，收敛速度更快
w1=tf.Variable(tf.truncated_normal([784,1000],stddev=0.1))
# 偏置初始值为0不是最优的，可以设置为0.1，收敛速度更快
b1=tf.Variable(tf.zeros([1000])+0.1)
# 引入激活函数
l1=tf.nn.tanh(tf.matmul(x,w1)+b1)
# 引入dropout
l1_drop=tf.nn.dropout(l1,keep_prob)

w2=tf.Variable(tf.truncated_normal([1000,100],stddev=0.1))
b2=tf.Variable(tf.zeros([100])+0.1)
l2=tf.nn.tanh(tf.matmul(l1_drop,w2)+b2)
l2_drop=tf.nn.dropout(l2,keep_prob)

w3=tf.Variable(tf.truncated_normal([100,10],stddev=0.1))
b3=tf.Variable(tf.zeros([10])+0.1)

# softmax的作用是将tf.matmul(l2_drop,w3)+b3的结果转换为概率值，举例如下：
# [9,2,1,1,2,1,1,2,1,1]
# [0.99527,0.00091,0.00033,0.00033,0.00091,0.00033,0.00033,0.00091,0.00033,0.00033]
prediction=tf.nn.softmax(tf.matmul(l2_drop,w3)+b3)

# 定义损失函数
# 由于输出神经元使用softmax函数，交叉熵损失函数比均方误差损失函数收敛速度更快
# loss=tf.reduce_mean(tf.square(y-prediction))
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))

# 定义优化器
optimizer=tf.train.GradientDescentOptimizer(0.2)

# 定义模型，优化器通过调整loss里的参数，使loss不断减小
train=optimizer.minimize(loss)

# 统计准确率
# tf.argmax返回第一个参数中最大值的下标
# tf.equal比较两个参数是否相等，返回True或False
correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
# tf.cast将布尔类型转换为浮点类型
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.Session() as sess:
	sess.run(tf.global_variables_initializer())
	# epoch为周期数，所有批次训练完为一个周期
	for epoch in range(20):
		for batch in range(n_batch):
			# 每次取出batch_size条数据进行训练
			batch_xs,batch_ys=mnist.train.next_batch(batch_size)
			sess.run(train,feed_dict={
    
    x:batch_xs,y:batch_ys,keep_prob:0.9})
		test_acc = sess.run(accuracy,feed_dict={
    
    x:mnist.test.images,y:mnist.test.labels,keep_prob:0.9})
		train_acc = sess.run(accuracy,feed_dict={
    
    x:mnist.train.images,y:mnist.train.labels,keep_prob:0.9})
		print('epoch=',epoch,' ','test_acc=',test_acc,' ','train_acc=',train_acc)

Running results:
Insert picture description here
Substitute the test data set and the training data set to calculate the accuracy rate. It is found that the results are not much different, indicating that there is no over-fitting situation

Tensorflow study notes-overfitting

Guess you like