Tensorflow study notes-loss function

1. Mean square error loss function
The partial derivative of the loss function to the weight is proportional to the gradient of the activation function. If the activation function is linear, this loss function can be used. If the activation function is a sigmoid function, this loss function is not suitable for the following reasons:
if we expect the output value to be 1, A is far away from 1, The activation function gradient is also larger, and the optimizer adjustment step is also larger; B is closer to 1, the activation function gradient is also smaller, and the optimizer adjustment step is also smaller, which is reasonable.
If we expect the output value to be 0, A is farther from 0, the activation function gradient is also larger, and the optimizer adjustment step is also larger; B is farther from 0, the activation function gradient is smaller, and the optimizer adjustment step is also If it is smaller, it will take a long time for the output value to adjust to 0, which is unreasonable.
Insert picture description here
2. Cross-entropy loss function
The partial derivative of the loss function to the weight has nothing to do with the gradient of the activation function, and is proportional to the difference between the true value of the predicted value. This loss function can be used regardless of whether the activation function is linear or sigmoid function. When the deviation between the predicted value and the true value is large, the optimizer adjusts the step larger, and when the deviation between the predicted value and the true value is small, the optimizer adjusts the step smaller, which is reasonable.
3. Log-likelihood loss function
For classification problems, the output neuron is the softmax function. At this time, the commonly used loss function is the log-likelihood loss function. The log-likelihood loss function is combined with the softmax function, and the cross-entropy loss function is combined with the sigmoid function. These two combinations are very similar. For two classification problems, the log-likelihood loss function can be simplified to the cross-entropy loss function.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# 获取数据集
# one_hot设置为True，将标签数据转化为0/1，如[1,0,0,0,0,0,0,0,0,0]
mnist=input_data.read_data_sets('MNIST_data',one_hot=True)

# 定义一个批次的大小
batch_size=100
n_batch=mnist.train.num_examples//batch_size

# 定义两个placeholder
# 行数值为None，None可以取任意数，本例中将取值100，即取决于pitch_size
# 列数值为784，因为输入图像尺寸已由28*28转换为1*784
x=tf.placeholder(tf.float32,[None,784])
y=tf.placeholder(tf.float32,[None,10])

# 定义两个变量
w=tf.Variable(tf.zeros([784,10]))
b=tf.Variable(tf.zeros([10]))

# 定义一个神经网络
# softmax的作用是将tf.matmul(x,w)+b的结果转换为概率值，举例如下：
# [9,2,1,1,2,1,1,2,1,1]
# [0.99527,0.00091,0.00033,0.00033,0.00091,0.00033,0.00033,0.00091,0.00033,0.00033]
prediction=tf.nn.softmax(tf.matmul(x,w)+b)

# 定义损失函数
# 由于输出神经元使用softmax函数，交叉熵损失函数比均方误差损失函数收敛速度更快
# loss=tf.reduce_mean(tf.square(y-prediction))
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))

# 定义优化器
optimizer=tf.train.GradientDescentOptimizer(0.2)

# 定义模型，优化器通过调整loss里的参数，使loss不断减小
train=optimizer.minimize(loss)

# 统计准确率
# tf.argmax返回第一个参数中最大值的下标
# tf.equal比较两个参数是否相等，返回True或False
correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
# tf.cast将布尔类型转换为浮点类型
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.Session() as sess:
	sess.run(tf.global_variables_initializer())
	# epoch为周期数，所有批次训练完为一个周期
	for epoch in range(20):
		for batch in range(n_batch):
			# 每次取出batch_size条数据进行训练
			batch_xs,batch_ys=mnist.train.next_batch(batch_size)
			sess.run(train,feed_dict={
    
    x:batch_xs,y:batch_ys})
		acc = sess.run(accuracy,feed_dict={
    
    x:mnist.test.images,y:mnist.test.labels})
		print('epoch=',epoch,' ','acc=',acc)

operation result:
Insert picture description here

Tensorflow study notes-loss function

Guess you like