Detailed explanation of the principle of cross entropy loss function

https://blog.csdn.net/b1055077005/article/details/100152102

https://blog.csdn.net/tsyccnh/article/details/79163834

 

Important summary in the article:

Relative entropy (KL divergence)

If there are two separate probability distributions P (x) and Q (x) for the same random variable X [In machine learning, P (x) is often used to represent the true distribution of the sample , and Q (x) to represent the model Predicted distribution], we can use KL divergence to measure the difference between these two probability distributions. The smaller the KL divergence, the closer the distribution of P (x) and Q (x ), and the distribution of Q (x) can be approximated to P (x) by repeatedly training Q (x ).

 

KL divergence = cross entropy-information entropy

When the machine learning training network, the input data and labels are often determined, then the true probability distribution P (x) is also determined, so the information entropy is a constant here. Since the value of KL divergence represents the difference between the true probability distribution P (x) and the predicted probability distribution Q (x ), the smaller the value, the better the predicted result, so the KL divergence needs to be minimized, and the cross entropy is equal to KL The divergence plus a constant (information entropy), and the formula is easier to calculate than the KL divergence, so the cross-entropy loss function is often used to calculate the loss in machine learning .

 

to sum up:

  • Cross entropy can measure the degree of difference between two different probability distributions in the same random variable. In machine learning, it is expressed as the difference between the true probability distribution and the predicted probability distribution. The smaller the value of cross entropy, the better the prediction effect of the model .
  • Cross entropy is often standard configuration with softmax in classification problems. Softmax processes the output results so that the sum of the predicted values ​​of multiple classifications is 1, and then calculates the loss through cross entropy.

 

 

Guess you like

Origin blog.csdn.net/weixin_43135178/article/details/115283607