Information Theory Lectures (1)

1. Self-information and mutual information
Definition 2.1: The self-information of a random event is defined as the negative value of the logarithm of the probability of occurrence of the event, that is, I(x) = -logp(x)

Note: Usually, the base of the logarithm is 2, and the unit of information quantity is bits. When the base of the logarithm is 2, 2 is often omitted.
If the base of the logarithm is e, the unit of self-information is knight. Commonly used in theoretical derivation and continuous sources. In
engineering , 10 is commonly used, and its unit is Hallett.
If the logarithm with the base r is taken, I(x)=-logr p(x) r base unit
1r base unit = log2 r base unit

Definition 2.2: The information of one event y about another event x is defined as mutual information, which is expressed by I(x;y)
I(x;y)=I(x) - I(x|y) = -logp(x) +logp(x|y)=logp(x|y)/p(x)

Understanding: The self-information of an event with probability 1 is 0, and the self-information of an event with probability close to 0 is close to infinity. That is to say, the amount of self-information reflects the uncertainty of the event. The greater the amount of self-information, that is, the greater the uncertainty, the smaller the probability of occurrence. The mutual information is the uncertainty about the x event that is eliminated after a y event is known, and it is a eliminated quantity. That is, the greater the amount of mutual information, the more effective this event is to determine event x.

Definition 2.3: The statistical mean of the self-information I(x) of each possible value of the random variable X is defined as the average self-information amount of the random variable X, that is, H(X)=-Σp(xi)logp(xi). Average Self-Information, also known as Information Entropy

2. Entropy function: The information entropy H(x) is a function of the probability distribution of the random variable X, so it becomes an entropy function. If the probability distribution p(xi),i=1,2…q is denoted as p1,p2..pq, then the entropy function can be written as the function form of the probability vector p = (p1,p2…pq), denoted as H (p).
H(p) =-Σpilogpi=H(p1,p2…,pq) =H(X)

Properties:
1. Symmetry, the order of each component can be changed arbitrarily, and the entropy value remains unchanged
2. Deterministic, as long as one component is 0, the entropy value is 0
3. Non-negative
4. Extensibility, because clogc=0 (c approaches 0), so adding a small probability event that basically does not occur, the entropy of the information source remains unchanged.
5. Small changes in probability components in the probability space of continuous sources will not cause changes in entropy
6. Recursion assumes that the probability distribution of n elements of a source is p1, p2..pn, where an element xn If it is divided into m elements, the new source entropy obtained is equal to the original source entropy plus pnH (q1/pn, q2/pn..qm/pn), which is caused by the division.
7. Extreme value, when the discrete source and each message appear with equal probability, the entropy value is the largest, and the maximum is logn
8. Convexity

3. Joint entropy and conditional entropy
Definition 2.4: The joint entropy of a two-dimensional random variable XY is defined as the mathematical expectation of the joint self-information, which is a measure of the uncertainty of the two-dimensional random variable XY
H(XY)=ΣΣp(xiyj)log1 /p(xiyj)
conditional entropy
Definition 2.5: H(Y|X)=Σp(xi)H(Y|xi)=-ΣΣp(xiyj)logp(yj|xi), where H(Y|X) means known Average uncertainty of Y when X.

Similarly H(X|Y)=-ΣΣp(xiyj)logp(xi|yj)

The relationship between various types of entropy is as follows:
H(XY)=H(X)+H(Y|X)
Proof: H(XY)=ΣΣp(xiyj)logp(xiyj) = ΣΣp(xiyj)logp(xi)p(yj |xi) =H(X)+H(Y|X)<= H(X)+H(Y)

4. Average Mutual Information
Mutual information I(xi;yi) represents the information about another event xi given by an event yi, which changes with the changes of xi and yi, in order to represent a random variable Y as a whole Given the amount of information about another random variable X, define the statistical mean of mutual information I(xi;yi) in the joint probability distribution space of XY as the average mutual information between random variables X and Y
Definition 2.6: I( X;Y)=ΣΣp(xiyi)I(xi;yi)=ΣΣp(xiyi)logp(x|y)/p(x)=H(X)-H(X|Y)

Properties:
1. Non-negativity
2. Reciprocity (symmetry) I(X;Y)=I(Y;X)
3.H(X;Y)=H(X)-H(X|Y)
= H(Y)-H(Y|X)
=H(X)+H(Y)-H(XY)
4. Extreme value I(X;Y)<=H(X),I(X;Y) <=H(Y)
5. Convex Functionality

Definition 2.7: Average conditional mutual information
I(X;Y|Z)=ΣΣΣp(xyz)logp(x|yz)/p(x|z)
Definition 2.8: Average joint mutual information
I(X;YZ)=ΣΣΣp(xyz) )p(x|yz) / p(x)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324882718&siteId=291194637