Entropy Entropy notes

Entropy Entropy

Experimental uncertainty (deterministic experiment) a fairly predictable, such as a coin flip P (H) = 1, zero entropy. A completely random experiments, such as craps, is the most unpredictable, with the greatest uncertainty, with the highest entropy in such experiments.

H ( x ) = S. i = 1 n p ( x i ) log 2 p ( x i ) H(x)=-\Sigma_{i=1}^np(x_i)\log_2p(x_i)

If the log base 2 of the case, the unit is bit entropy 'bit, to the base-e is then entropy unit is NAT, to base 10, then, the unit is hat.

Also known as entropy (Information Entropy) or Shannon entropy (Shannon Entropy).


Maximum Entropy Principle Principle of Maximum Entropy

A decision tree, comprising way to solve function extreme constraint implementation.

The essence of the principle of maximum entropy: the probability of system events to meet all known constraints do not assume any unknown information, that is, of the unknown, as equal probability processing.


Cross-entropy cross entropy

Cross entropy is used to compare two probability distributions. It will tell us how similar the two distributions.

H ( P , Q ) = S. x p ( x ) log q ( x ) H(P,Q)=-\Sigma_{x}p(x)\log{q(x)}

One common loss function.


Mutual information Mutual Information

Mutual information is a measure of the probability distribution of two random variables or between interdependent. It tells us how much of a variable is the amount of information carried by another variable.

Mutual information capturing dependencies between random variables, and is more generalized than the average correlation coefficient, the correlation coefficient only captures ordinary linear relationship.

Two discrete random variables X and Y of the mutual information is defined as:

I ( X ; Y ) = Σ y Y Σ x X log p ( x , y ) p ( x ) p ( y ) I(X;Y)=\Sigma_{y\in{Y}}\Sigma_{x\in{X}}\log{\frac{p(x,y)}{p(x)p(y)}}

Bayesian network, the structure of the relationship between variables may be used to determine the mutual information.


KL散度 Kullback Leibler Divergence

KL divergence is another method to find similarity between two probability distributions. It measures the degree of difference between an assignment and another assignment.

Suppose that we have some data, the actual distribution is "P".

D K L ( P Q ) = Σ x p ( x ) log p ( x ) q ( x ) D_{KL}(P||Q)=\Sigma_xp(x)\log{\frac{p(x)}{q(x)}}

KL divergence will be between "P" and "Q" Tell us, when we try to use "Q" approximate "P" given the data, we lose much information.

KL divergence is also called the relative entropy.

Relative entropy and cross entropy relationship:
D K L ( P Q ) = Σ x p ( x ) log p ( x ) q ( x ) = Σ x p ( x ) log p ( x ) Σ x p ( x ) log q ( x ) = H ( P ) + H ( P , Q ) \begin{aligned} D_{KL}(P||Q)&=\Sigma_xp(x)\log{\frac{p(x)}{q(x)}}\\ &=\Sigma_x{p(x)\log{p(x)}-\Sigma_xp(x)\log{q(x)}}\\ &=-H(P)+H(P,Q) \end{aligned}

Further visibility KL divergence is asymmetric, i.e., D K L ( P Q ) ̸ = D K L ( Q P ) D_{KL}(P||Q)\not=D_{KL}(Q||P)


JS散度 Jensen-Shannon divergence

Deformation KL divergence, and the range of values ​​of the correction symmetry, no use.

Guess you like

Origin blog.csdn.net/Excaliburrr/article/details/93738970