Cross entropy likelihood function

Cross entropy likelihood function

Reprinted from: https://zhuanlan.zhihu.com/p/70804197

Entropy

  • Entropy is a measure to eliminate the uncertainty of the amount of information required
    • Entropy is the degree of uncertainty of information
    • The smaller the entropy, the more information is determined
  • \ (Entropy = \ sum \ limits_ {x = 1} ^ n (x occurrence probability information \ times {x authentication information necessary information}) \)
    • This year China canceled the college entrance examination, this sentence we very uncertain (even my heart still think this is nonsense TM), then we have to check, so you need a lot of information (to verify); conversely if the normal college entrance examination this year, I recall: this is normal, ah, less need to verify, so the amount of information required is very small.
  • According to the information of the true distribution , we can find an optimal strategy to minimize the cost of the system to eliminate the uncertainty that the minimum entropy
  • The lower the probability, need more information to verify, so verification is inversely proportional to the amount of information and the probability of genuine need . We need to use mathematical expressions to describe it is derived:

Consider a discrete random variable [official], known metric information depends on the probability distribution [official], so we want to find a function [official], it is the probability that [official]the monotonically decreasing function (because \ (p (x) \) the greater the amount of information needed smaller), it represents the amount of information

How to find it? If we have two unrelated events [official]and [official]then observe the amount of information obtained when two events occur simultaneously observed should be equal to that obtained when each event occurred and the information, namely:
[official]

Because the two events are independent and unrelated , and therefore
[official]

According to these two relations, it is easy to see [official]some of [official]the log- related.
From the logarithmic algorithm:
[official]

So we have
[official]

Wherein the negative number is used to ensure that information is positive or zero. The [official]selected function group is arbitrary (information theory is often selected as the base 2, and therefore is a unit of information bits bits; the group of machine learning often chosen is a natural constant, often referred to as the unit Knight nats). [official]Also known as random variables [official]from information (self-information), describes the occurrence of an event of random variables caused by the amount of information .

Guess you like

Origin www.cnblogs.com/doragd/p/11373959.html