Cross Entropy in Machine Learning

Cross Entropy in Machine Learning

1, the amount of information

The smaller the probability, the greater the amount of information, events x_0 $ $ X = the amount of information is:
$$ the I (x_0) = - log (the p-(x_0)) $$

2, entropy

Entropy of all the desired information:
$$ H (X) = - \ sum_ {I}. 1 ^ = NP (x_i) log (P (x_i)) $$
wherein X n represents an n possible events

3, the relative entropy (KL divergence)

$$ D_ {KL} (p || q) = \ sum_ {i = 1} ^ np (x_i) log (\ frac {p (x_i)} {q (x_i)} $$
physical meaning: if P to describe the target problem, rather than Q to describe the problem, get the information gain

In machine learning, P tends to represent the true distribution of the sample, q represents the model prediction of the distribution, the smaller the relative entropy, q represents the distribution and closer the distribution p

4, cross entropy

Relative entropy can be modified as:
$$ D_ {KL} (P || Q) = - H (P (X)) + [- \ sum_ {I}. 1 ^ = NP (x_i) log (Q (x_i))] $$
first half of the equation is the entropy of p, is the second half of the cross-entropy:
$$ H (p, Q) = - \ sum_ {I}. 1 ^ = NP (x_i) log (Q (x_i)) $$
in machine learning, we need to assess the gap between labl and predicts, using KL divergence, but the KL divergence first half of the same, so in the optimization process, only need to focus on cross-entropy on the line, it is generally in machine learning used directly as a cross-entropy function loss, assessment.

Machine learning applications cross entropy

1. Why do the loss function with cross-entropy

  • Linear regression, often use MSE as a loss function; but not easy to use in logical categories, which is the need to use cross-entropy

2, used in a single cross entropy classification

  • Here refers to a single category: Only one category per sample
  • Cross entropy loss function on a single classification problem:
    $$ loss = - \ sum_ {J}. 1 ^ m = \ sum_. 1 = {I}} ^ JI ny_ {log (\ Hat {Y}} _ {JI) $ $
  • Here is the prediction probability calculated by softmax, the probability of a combined

3, cross entropy use in a multi-category

  • Here's multi-class means: each sample can have multiple categories
  • Cross entropy loss problem in multi-classification problems:
    $$ loss = \ sum_ {J}. 1 ^ m = \ sum_ = {I}. 1 ^ n-JI-Y_ {} log (\ Hat {Y} {} JI) - (. 1-Y {} JI) log (l- \ Hat {Y}} _ {JI) $$
  • Here is the prediction calculated by sigmoid, each label is independently distributed, the normalized output

Guess you like

Origin www.cnblogs.com/yzh1024/p/11262900.html