Brief introduction to cross entropy

What is cross entropy

Cross-entropy (Cross-entropy) is a common measurement method in information theory, which is often used to measure the difference between two probability distributions. In machine learning, cross-entropy is often used to measure the difference between the real probability distribution and the predicted probability distribution, and is used to evaluate the performance of classification models.

Suppose there are two probability distributions P and Q, then their cross entropy is:

where P(x) represents the probability of event x in the true distribution, Q(x) represents the probability of event x in the predicted distribution, and log represents the natural logarithm. The smaller the cross entropy, the closer the predicted distribution is to the real distribution, and the better the performance of the model.

Cross Entropy in Machine Learning

In machine learning, cross-entropy is often used as a loss function for training classification models. Taking the binary classification problem as an example, assuming y is the real label $p is the predicted probability of the model for y=1, then the cross-entropy loss function is:

When $y=1$, the loss function becomes $-\log(p)$, and when $y=0$, the loss function becomes $-\log(1-p)$. This loss function can be interpreted as: if the model predicts that the probability of $y=1$ is closer to the real value $y=1$, the loss function is smaller, otherwise the loss function is larger. Similarly, if the model predicts that the probability of $y=0$ is closer to the real value $y=0$, the loss function will be smaller, otherwise the loss function will be larger.

For multi-classification problems, the cross-entropy loss function can be expressed as:

where C is the number of categories, y is the one-hot encoding of the true label, and p is the predicted probability of the model for each category. Similarly, the loss function is smaller if the model's predictions are closer to the true label, otherwise the loss function is larger.

The advantage of the cross-entropy loss function is that it can be used not only for training classification models, but also for training neural network models. In neural networks, the cross-entropy loss function can be used to measure the difference between the network output and the real label, and the network parameters are updated through the back propagation algorithm to optimize the model. The cross-entropy loss function also has smooth and convex properties, which can ensure the stability and convergence of the optimization process.

Function plot of loss value and predicted value

Guess you like

Origin blog.csdn.net/mefocus/article/details/129481121