Cross entropy likelihood function
Reprinted from: https://zhuanlan.zhihu.com/p/70804197
Entropy
- Entropy is a measure to eliminate the uncertainty of the amount of information required
- Entropy is the degree of uncertainty of information
- The smaller the entropy, the more information is determined
- \ (Entropy = \ sum \ limits_ {x = 1} ^ n (x occurrence probability information \ times {x authentication information necessary information}) \)
- This year China canceled the college entrance examination, this sentence we very uncertain (even my heart still think this is nonsense TM), then we have to check, so you need a lot of information (to verify); conversely if the normal college entrance examination this year, I recall: this is normal, ah, less need to verify, so the amount of information required is very small.
- According to the information of the true distribution , we can find an optimal strategy to minimize the cost of the system to eliminate the uncertainty that the minimum entropy
- The lower the probability, need more information to verify, so verification is inversely proportional to the amount of information and the probability of genuine need . We need to use mathematical expressions to describe it is derived:
Consider a discrete random variable , known metric information depends on the probability distribution , so we want to find a function , it is the probability that the monotonically decreasing function (because \ (p (x) \) the greater the amount of information needed smaller), it represents the amount of information
How to find it? If we have two unrelated events and then observe the amount of information obtained when two events occur simultaneously observed should be equal to that obtained when each event occurred and the information, namely:
Because the two events are independent and unrelated , and therefore
According to these two relations, it is easy to see some of the log- related.
From the logarithmic algorithm:
So we have
Wherein the negative number is used to ensure that information is positive or zero. The selected function group is arbitrary (information theory is often selected as the base 2, and therefore is a unit of information bits bits; the group of machine learning often chosen is a natural constant, often referred to as the unit Knight nats). Also known as random variables from information (self-information), describes the occurrence of an event of random variables caused by the amount of information .