A comparative introduction to KL divergence and cross entropy

KL divergence (Kullback-Leibler Divergence) and cross entropy (Cross Entropy) are concepts widely used in machine learning. Both are used to compare the similarity between two probability distributions, but in some ways, they also differ. This article will give a detailed explanation and comparison of KL divergence and cross entropy.

81b02ebde3ec12921e6a663c1efd93d6.png

KL Divergence and Cross Entropy

KL divergence, also known as relative entropy (Relative Entropy), is a measure used to measure the difference between two probability distributions. It measures the average amount of additional information required when using a distribution Q to fit the true distribution P. The formula for KL divergence is as follows:

afac29a9510d2e1eb429cde27de82dc7.png

x is a possible event or state in the probability distribution. P(x) and Q(x) represent the probability of event x in the true probability distribution and the probability distribution predicted by the model, respectively.

KL divergence has the following properties:

  • The KL divergence is non-negative, that is, KLD(P||Q) >= 0, and the equality sign holds if and only if P and Q are exactly the same distribution.

  • The KL divergence does not satisfy the commutative law, that is, KLD(P||Q) != KLD(Q||P).

  • KL divergence is usually not symmetrical, ie KLD(P||Q) != KLD(Q||P).

  • KL divergence is not a metric because it does not have symmetry and triangle inequality.

In machine learning, KL divergence is often used to compare the difference between two probability distributions, for example in unsupervised learning to evaluate the performance of generative models.

Cross-entropy is another way to compare the similarity between two probability distributions. Its formula is as follows:

b2d751be20b34bbe527b577c0e2e5127.png

x is a possible event or state in the probability distribution. P(x) and Q(x) represent the probability of event x in the true probability distribution and the probability distribution predicted by the model, respectively. Cross-entropy measures the difference between the probability distribution predicted by the model and the real probability distribution, that is, the gap between the uncertainty of the model's prediction and the uncertainty of the real situation.

Unlike KL divergence, cross-entropy has the following properties:

  • Cross-entropy is non-negative, that is, CE(P, Q) >= 0, and the equality sign holds if and only if P and Q are exactly the same distribution.

  • Cross entropy satisfies the commutative law, ie CE(P, Q) = CE(Q, P).

  • Cross entropy is symmetric, ie CE(P, Q) = CE(Q, P).

  • Cross entropy is not a metric because it does not have the triangle inequality.

In machine learning, cross-entropy is often used to measure the difference between model predictions and true labels. For example, in classification tasks, cross-entropy is used as a loss function to measure the difference between the class distribution predicted by the model and the true labels.

Relationship between KL divergence and cross entropy

L-divergence is related to cross-entropy. In probability theory, KL divergence can be defined as the difference between the cross-entropy between two probability distributions and the entropy of the true distribution. Specifically, the formula for KL divergence is as follows:

e869161c8159df9fdcfe6a0705a86777.png

H(P, Q) represents the cross entropy of P and Q, and H(P) represents the entropy of P. It can be seen that KL divergence includes the concepts of cross entropy and entropy, so there is a close relationship between them.

Application of KL Divergence and Cross Entropy

Cross-entropy is often used in supervised learning tasks such as classification and regression. In these tasks, we have a set of input samples and corresponding labels. We want to train a model such that the model maps input samples to the correct labels.

In this case, we can use cross entropy as the loss function. Suppose we have a model predicting an output distribution p and a distribution q of the true labels. Then the formula for cross entropy is as follows:

20022d4e264cdba21f17669760b0526f.png

i represents a possible class or event, and p_i and q_i represent the probability of class i in the true probability distribution and the probability distribution predicted by the model, respectively.

KL divergence is often used in unsupervised learning tasks such as clustering, dimensionality reduction, and generative models. In these tasks, we do not have corresponding label information, so we cannot use cross entropy to evaluate the performance of the model, so we need a way to measure the difference between the distribution predicted by the model and the real distribution, then we can use KL divergence to measure the difference between the distribution predicted by the model and the true distribution. The formula for KL divergence is as follows:

f67a6ba5f25fd25afcbcdf89add5e0d0.png

i represents a possible event or state in the probability distribution. p_i and q_i represent the probability of event i in the true probability distribution and the probability distribution predicted by the model, respectively. The KL divergence measures the difference between the probability distribution predicted by the model and the real probability distribution, that is, the gap between the uncertainty of the model's prediction and the uncertainty of the real situation.

In general: cross-entropy is usually used in supervised learning tasks, and KL divergence is usually used in unsupervised learning tasks. When we have corresponding label information, cross entropy should be used to evaluate the performance of the model; when we do not have corresponding label information, KL divergence can be used to measure the difference between the distribution predicted by the model and the real distribution.

Summarize

In this paper, we introduce the two concepts of KL divergence and cross entropy, and compare the similarities and differences between them. KL divergence is used to compare the difference between two probability distributions, while cross-entropy is used to measure the difference between model predictions and true labels. Although they are somewhat related, they differ in use and application. In machine learning, both KL divergence and cross-entropy are widely used, and can be used to evaluate the performance of the model and update the model parameters.

Transfer: deepHub IMBA

推荐阅读:

我的2022届互联网校招分享

我的2021总结

浅谈算法岗和开发岗的区别

互联网校招研发薪资汇总
2022届互联网求职现状,金9银10快变成铜9铁10!!

公众号:AI蜗牛车

保持谦逊、保持自律、保持进步

发送【蜗牛】获取一份《手把手AI项目》(AI蜗牛车著)
发送【1222】获取一份不错的leetcode刷题笔记

发送【AI四大名著】获取四本经典AI电子书

Guess you like

Origin blog.csdn.net/qq_33431368/article/details/130397363