AI Machine Learning - Information Entropy, Conditional Entropy, Information Gain

Information entropy

Information entropy is a measure of the degree of ordering of a system. For example, to figure out something very, very uncertain, or something we don't know, we need to know a lot of information. Conversely, if we already know more about something, we don't need much information to figure it out.

Therefore, from this perspective, we can think that the measure of the amount of information is equal to the amount of uncertainty. The more orderly a system, the lower the information entropy; conversely, the more chaotic a system, the higher the information entropy. In 1948, Shannon proposed the concept of " information entropy " (shāng), so the information entropy is also called Shannon entropy, assuming that the proportion of the i-th sample in a set D is pi (i=1,2,3...n ), then the information entropy of D can be expressed as:

blob.png


Conditional entropy

Now we assume that the training data D is divided according to the attribute A. Assuming that the attribute A has v possible values, the v subsets (that is, the v branches in the tree) are split according to the attribute A. Each possible value The set is D j , then the conditional entropy calculation method of attribute A is (|D j | and |D| represent the number of elements in the set):

image


information gain

The information entropy minus the conditional entropy indicates the degree to which this condition reduces the information entropy, that is, how much uncertainty can be reduced in the judgment of information. The larger the value, the greater the reduction of the information entropy by a certain conditional entropy, that is, This attribute plays a greater role in the judgment of information. The formula for calculating the information gain of attribute A is:

image


Pay attention to the WeChat public account "Kicked Xueba" for more exciting articles

qrcode_for_gh_be6c076f1d26_258

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324601610&siteId=291194637