[TensorFlow1.X] Series of Study Notes [Getting Started Four]

[TensorFlow1.X] Series of Study Notes [Getting Started Four]

The algorithms of a large number of classic papers are implemented using TF 1.x. In order to facilitate reading and deepen the understanding of the implementation details, knowledge of TF 1.x is required.

[TensorFlow1.X] Series learning article directory



Preface

To build a neural network, the next step is to learn the neural network optimization principles and implementation details. This blog post will explain the role of the loss function in detail.


Loss function effect

The loss function plays an important role in deep learning. It is used to measure the difference between the model prediction results and the actual labels, guide the model training process, and is a key component in evaluating model performance and optimizing model parameters. The loss function has the following main functions:

  1. Measuring model performance: The loss function is used to measure the performance of the model on the given data. By calculating the difference between the predictions and the true labels, the accuracy and error size of the model can be evaluated.
  2. Reflecting the optimization goal: The loss function defines the goal of the optimization algorithm, which is to minimize the value of the loss function. By minimizing the loss function, the model can minimize the difference between the prediction results and the true labels and improve the performance of the model.
  3. Guide parameter update: During the optimization process, the gradient of the loss function is used to guide the direction and magnitude of parameter update. By calculating the gradient of the loss function with respect to the model parameters, the direction of parameter update can be determined so that the value of the loss function gradually decreases.
  4. Supports model selection and comparison: different loss functions are suitable for different tasks and problems. Choosing a loss function appropriate for the task can help the model learn and fit the data better. Furthermore, by comparing the performance of different models under the same loss function, the best model architecture and hyperparameters can be selected.

Common loss functions in deep learning include:

  • Mean Squared Error (MSE): Suitable for regression problems, measuring the average squared difference between the predicted value and the true value.
  • Cross Entropy: Suitable for classification problems, measuring the difference between the predicted probability distribution and the true probability distribution.

Choosing an appropriate loss function depends on the task type, data characteristics, and the problem the model is intended to solve. Different loss functions have different effects on the model training and optimization process, so they need to be carefully selected and adjusted to achieve the best results.


Mean Squared Error (MSE)

Mean square error mse: predicted value of n samples y y yYO MA实值 y _ y\_ y_'s differential sum of squares, rectified mean 值.
M S E ( y _ , y ) = ∑ i = 1 n ( y − y _ ) 2 n MSE({\rm{y\_}},y) = \frac{ {\sum\nolimits_{i =1 }^n { { {(y - y\_)}^2}} }}{n} MSE(y_,y)=ni=1n(yy_)2
Used in Tensorflow1. y _ = ( 1 , 0 ) {\rm{y\_ = (1, 0)}} loss_mse = tf.reduce_mean(tf.square(y_ - y))
and_=(1,0), the first neural network model predicts y 1 = ( 0.7 , 0.5 ) {\rm{y_1= (0.7, 0.5)}} and1=(0.7,0.5), the second neural network model predicts the result as y 2 = ( 0.8 , 0.1 ) {\rm{y_2= (0.8, 0.1)}} and2=(0.8,0.1), determine which neural network model predicts a result that is closer to the standard answer.
According to the calculation formula of mean square error:
M S E _ 1 ( ( 1 , 0 ) , ( 0 . 7 , 0 . 5 ) ) = ( 0 . 7 − 1 ) 2 + ( 0 . 5 − 0 ) 2 = 0 . 34 {\rm{MSE\_1((1,0),(0}}{\rm{.7, 0}}{\rm {.5)) = (0}}{\rm{.7 - 1}}{ {\rm{)}}^2} + { {\rm{(0}}{\rm{.5 - 0)}}^2} = {\rm{0}}{\rm{.34}} MSE_1((1,0),(0.7,0.5))=(0.71)2+(0.50)2=0.34
M S E _ 2 ( ( 1 , 0 ) , ( 0 . 8 , 0 . 1 ) ) = ( 0 . 8 − 1 ) 2 + ( 0 . 1 − 0 ) 2 = 0 . 05 {\rm{MSE\_2((1,0),(0}}{\rm{.8, 0}}{\rm{.1)) = (0}}{\rm{.8 - 1}}{ {\rm{)}}^2} + { {\rm{(0}}{\rm{.1 - 0)}}^2} ={\rm{0}}{\rm{.05}} MSE_2((1,0),(0.8,0.1))=(0.81)2+(0.10)2=0.05
由于0.34>0.05,因梄测值 y 2 y_2 and2YO MA实值 y _ y\_ y_closer, y 2 y_2 and2Predictions are more accurate.

For multi-classification problems, MSE is not a commonly used evaluation metric.

This method was used in both linear regression and nonlinear regression in the previous blog post [TensorFlow1.X Introduction 3].


Cross Entropy

Cross Entropy: Predicted value of n samples y y yYO MA实值 y _ y\_ The distance between the probability distributions of y_. The greater the cross entropy, the further the distance between the two probability distributions. The more different they are; the smaller the cross entropy, the closer the two probability distributions are, and the more similar the two probability distributions are.
H ( y _ , y ) = − ∑ i = 1 n y _ ∗ log ⁡ y H({\rm{y\_}},y) = - \sum\nolimits_{i = 1}^n {y\_} *\log y H(y_,y)=i=1ny_logy
is used in Tensorflow1. 3> H _ 1 ( ( 1 , 0 ) , ( 0.7 , 0.5 ) ) = − ( 1 ∗ l o g 0.7 + 0 ∗ l o g 0.3 ) ≈ 0.36 H\_1\left( {\left( {1,0} \right ),\left( {0.7,0.5} \right)} \right){\rm{ }} = {\rm{ }} - \left( {1*log0.7{\rm{ }} + {\rm { }}0*log0.3} \right) \approx {\rm{ }}0.36 loss_ce = -tf.reduce_mean(y_* tf.log(tf.clip_by_value(y, 1e-12, 1.0)))

H_1((1,0),(0.7,0.5))=(1log0.7+0log0.3)0.36
H _ 2 ( ( 1 , 0 ) , ( 0.8 , 0.1 ) ) = − ( 1 ∗ l o g 0.8 + 0 ∗ l o g 0.1 ) ≈ 0.22 H\_2\left( {\left( {1,0} \right),\left( {0.8,0.1} \right)} \right){\rm{ }} = {\rm{ }} - \left( {1*log0.8{\rm{ }} + {\rm{ }}0*log0.1} \right) \approx {\rm{ }}0.22 H_2((1,0),(0.8,0.1))=(1log0.8+0log0.1)0.22
Since 0.36>0.22, the predicted value y 2 y_2 and2YO MA实值 y _ y\_ y_closer, y 2 y_2 and2Predictions are more accurate.

这り n n There is no connection between the output probabilities of n classification, and each probability value is usually compressed between min and max: values ​​less than min are equal to min, and values ​​greater than max The value is equal to max. tf.clip_by_value(y, 1e-12, 1.0) is between 0 and 1. In this case, the value of H_1 above will be compressed into H _ 1 ( ( 1 , 0 ) , ( 0.7 , 0.3 ) ) ≈ 0.36 H\_1\left( {\left( {1,0} \right),\left( {0.7,0.3} \right)} \right){\rm{ }}\ approx {\rm{ }}0.36 H_1((1,0),(0.7,0.3))0.36, reason H _ 1 H\_1 H_1Official publisher's photos available.

In deep learning, the output of the model is generally passed through the softmax function to obtain the probability distribution of the output classification, and then compared with the true value to calculate the cross entropy and obtain the loss function. Softmax function requirements n n The output of n classification satisfies the function of the following probability distribution requirements: p ( X = x i ) ∈ [ 0 , 1 ] p(X = {x_i}) \in [0,1] p(X=xi)[0,1] ∑ i n p ( X = x i ) = 1 \sum\nolimits_i^n {p(X = {x_i}) = 1} inp(X=xi)=1,softmax函数具体表示为: p ( X = x i ) = s o f t m a x ( x i ) = e x i ∑ i = 1 n e x i p(X = {x_i}) = {\rm{softmax}}({x_i}) = \frac{ { {e^{ {x_i}}}}}{ {\sum\nolimits_{i = 1}^n { {e^{ {x_i}}}} }} p(X=xi)=softmax(xi)=i=1nIt isxiIt isxi.
After combining softmax and Cross Entropy in Tensorflow1. loss_ce = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))) H _ 1 ( ( 1 , 0 ) , s o f t m a x ( 0.7 , 0.5 ) ) ≈ H _ 1 ( ( 1 , 0 ) , ( 0.57 , 0.43 ) ) = 0.56 H\_1\left( {\left ( {1,0} \right),softmax\left( {0.7,0.5} \right)} \right) \approx H\_1\left( {\left( {1,0} \right),\left( {0.57,0.43} \right)} \right){\rm{ }} = {\rm{ }}0.56

H_1((1,0),softmax(0.7,0.5))H_1((1,0),(0.57,0.43))=0.56
H _ 2 ( ( 1 , 0 ) , s o f t m a x ( 0.8 , 0.1 ) ) ≈ H _ 1 ( ( 1 , 0 ) , ( 0.79 , 0.21 ) ) = 0.24 H\_2\left( {\left( {1,0} \right),softmax\left( {0.8,0.1} \right)} \right) \approx H\_1\left( {\left( {1,0} \right),\left( {0.79,0.21} \right)} \right){\rm{ }} = {\rm{ }}0.24 H_2((1,0),softmax(0.8,0.1))H_1((1,0),(0.79,0.21))=0.24
Since 0.56>0.24, the predicted value y 2 y_2 and2YO MA实值 y _ y\_ y_closer, y 2 y_2 and2Predictions are more accurate.
This method was used in the logistic regression in the previous blog post [TensorFlow1.X Introduction 3].


Custom loss function

Customize a reasonable loss function based on the actual situation of the problem.
For example: Regarding the problem of predicting the daily sales volume of yogurt, if the predicted sales volume is greater than the actual sales volume, you will lose costs; if the predicted sales volume is less than the actual sales volume, you will lose profits. In real life, the cost of manufacturing a box of yogurt and the profit of selling a box of yogurt are often not equivalent. Therefore, it is necessary to use a custom loss function that fits the problem: l o s s = ∑ i = 0 n f ( y _ , y ) loss = \sum\nolimits_{i = 0}^n {f (y\_,y)} loss=i=0nf(y_,y)
自定义损失函数制作为分段函数: f ( y _ , y ) = { p r o f i t × ( y _ − y ) , y < y _ c o s t × ( y _ − y ) , y > = y _ f(y\_,y) = \left\{ {\begin{array}{cc} {profit \times (y\_ - y),y < y\_}\\ {cost \times (y\_ - y),y > = y\_} \end{array}} \right. f(y_,y)={ profit×(y_y),and<y_cost×(y_y),and>=y_
If the prediction result y is less than the standard answer y_, the loss function is profit times the difference between the prediction result y and the standard answer y_;
If the prediction result y is greater than the standard answer y_, the loss The function is the cost multiplied by the difference between the predicted result y and the standard answer y_.
Use loss = tf.reduce_sum(tf.where(tf.greater(y,y_),COST(y-y_),PROFIT(y_-y)))
to customize the complete code of the loss function in Tensorflow1.X:

#coding:utf-8
#酸奶成本1元, 酸奶利润9元
#预测少了损失大,故不要预测少,故生成的模型会多预测一些
#0导入模块,生成数据集
import tensorflow as tf
import numpy as np
BATCH_SIZE = 8
SEED = 23455
COST = 8
PROFIT = 2

rdm = np.random.RandomState(SEED)
X = rdm.rand(32,2)
# 假设市场的真实利润Y=3*COST+2*PROFIT
Y = [[3*x1+2*x2+(rdm.rand()/10.0-0.05)] for (x1, x2) in X]

# 定义神经网络的输入、参数和输出,定义前向传播过程。
x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))
# 希望训练好的网络权重接近w=(a,b)接近(3,2)
w = tf.Variable(tf.random_normal([2, 1], stddev=1, seed=1))
y = tf.matmul(x, w)

#定义损失函数及反向传播方法。
loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_)*COST, (y_ - y)*PROFIT))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

#生成会话,训练STEPS轮。
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    STEPS = 3000
    for i in range(STEPS):
        start = (i*BATCH_SIZE) % 32
        end = (i*BATCH_SIZE) % 32 + BATCH_SIZE
        sess.run(train_step, feed_dict={
    
    x: X[start:end], y_: Y[start:end]})
        if i % 1000 == 0:
            print ("After %d training steps, w1 is: " % (i))
            print (sess.run(w), "\n")
    print ("Final w1 is: \n", sess.run(w))


Complete code using the mean square loss function:

Only need to modify the loss part

#coding:utf-8
#酸奶成本1元, 酸奶利润9元
#预测少了损失大,故不要预测少,故生成的模型会多预测一些
#0导入模块,生成数据集
import tensorflow as tf
import numpy as np
BATCH_SIZE = 8
SEED = 23455
COST = 8
PROFIT = 2

rdm = np.random.RandomState(SEED)
X = rdm.rand(32,2)
# 假设市场的真实利润Y=3*COST+2*PROFIT
Y = [[3*x1+2*x2+(rdm.rand()/10.0-0.05)] for (x1, x2) in X]

# 定义神经网络的输入、参数和输出,定义前向传播过程。
x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))
# 希望训练好的网络权重接近w=(a,b)接近(3,2)
w = tf.Variable(tf.random_normal([2, 1], stddev=1, seed=1))
y = tf.matmul(x, w)

#定义损失函数及反向传播方法。
loss = tf.reduce_mean(tf.square(y_ - y))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

#生成会话,训练STEPS轮。
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    STEPS = 3000
    for i in range(STEPS):
        start = (i*BATCH_SIZE) % 32
        end = (i*BATCH_SIZE) % 32 + BATCH_SIZE
        sess.run(train_step, feed_dict={
    
    x: X[start:end], y_: Y[start:end]})
        if i % 1000 == 0:
            print ("After %d training steps, w1 is: " % (i))
            print (sess.run(w), "\n")
    print ("Final w1 is: \n", sess.run(w))


It can be seen from the execution results that no matter how the values ​​of COST and PROFIT are modified, the weight predicted using the custom loss function (2.98, 1.98) is closer to the true value (3, 2) than the weight predicted using the mean square error (1.88, 2.83) , more in line with actual needs.

Summarize

Loss function plays a very important role in machine learning and deep learning. It is a function that measures the difference or error between the model predictions and the true labels. By minimizing the loss function, we can train the parameters of the model to better fit the training data and make accurate predictions on new unseen data. The blog post explains common loss functions in deep learning, and analyzes and explains how to customize a reasonable loss function based on the actual situation of the problem.


Guess you like

Origin blog.csdn.net/yangyu0515/article/details/133964538