Common loss function loss (mean square error, cross entropy)

1. Mean Squared Error

Indicates the mean value of the sum of squares of the corresponding point errors between the predicted data and the original data, and is often used in linear regression problems.

official:loss=\frac{1}{2m}\sum\limits_{i=1}^{m}{​{​{({​{y}_{i}}-\hat{y_{i}})}^{2}}}

m represents the sample size.

Two, cross entropy (cross entropy):

An extremely interesting phenomenon: the cross entropy of A and B = the KL divergence of A and B - the entropy of A.

information theory

1. The amount of information

Meaning: The less likely ( p(x)less probable) an event has occurred, the greater the amount of information it has. 

official:I(x)=-log(p(x))

Represents: negative logarithmic function.

2. Entropy

Meaning: The expected amount of all information.

official:H(X)=-\sum\limits_{i=1}^{n}{p({​{x}_{i}})log(p({​{x}_{i}}))}

The greater the uncertainty, the greater the entropy.

Relative entropy & KL divergence (Kullback-Leibler (KL) divergence)

Meaning: To measure the difference between two separate probability distributions P(x) and Q(x) of the same random variable x, the smaller the KL divergence value, the closer the two distributions are, which is like the distance between the two distributions .

official:{​{D}_{KL}}(p||q)=\sum\limits_{i=1}^{n}{p({​{x}_{i}})log(\frac{p({​{x}_{i}})}{q({​{x}_{i}})})}

Cross entropy:

Meaning: Like KL divergence, it measures the difference between two distributions and is often used in classification problems.

official:H(p,q)=-\sum\limits_{i=1}^{n}{p({​{x}_{i}})log(q({​{x}_{i}})}

Why use cross entropy in deep learning

Purpose: Find the gap between the target value and the predicted value.

For the use of cross-entropy as a loss function in deep learning, it is actually no different from using KL divergence, and it is easier to calculate cross-entropy.

KL divergence formula deformation:

{​{D}_{KL}}(p||q) =\sum\limits_{i=1}^{n}{p({​{x}_{i}})log(\frac{p({​{x}_{i}})}{q({​{x}_{i}})})} \\ =\sum\limits_{i=1}^{n}{p({​{x}_{i}})[log(p({​{x}_{i}}))-log(q({​{x}_{i}}))]} \\ =\sum\limits_{i=1}^{n}{p({​{x}_{i}})log(p({​{x}_{i}}))}\sum\limits_{i=1}^{n}{p({​{x}_{i}})log(q({​{x}_{i}}))} \\ =-H(p(x))+[-\sum\limits_{i=1}^{n}{p({​{x}_{i}})log(q({​{x}_{i}}))}] \\

The second half of the formula happens to be the cross entropy, where the entropy H(p(x)) is constant.

Note: For deep learning optimization, minimizing KL divergence is equivalent to minimizing cross entropy.

Uniclass cross entropy

Multi-category cross entropy

Guess you like

Origin blog.csdn.net/qq_41750911/article/details/124075295