Several coding information theory

Sort out several coding at information theory.

Unequal-length coding theorem

If the entropy of a discrete memoryless source is H (U), each source symbol is binary variable-length code with symbol D, then, there exists a lossless coding method, comprising a uniquely coded, the average code length satisfying:
\ [\ FRAC {H (the U-)} {\ log (D)} \ Leq \ overline {n-} \ Leq \ FRAC {H (the U-)} {\ log (D)} +. 1 \]
for average symbol entropy HL (U) discrete smooth memoryless sources, there must be a lossy encoding method is not the average code length satisfying the inequality:
\ [\ {H_L FRAC (the U-)} {\ log (D)} \ leq \ overline {n} \ leq \ frac {H_L (U)} {\ log (D)} + \ frac {1} {L} \]

Shannon coding

Encoding step

  1. Descending order according to the probability of source symbols;
  2. Calculating the code length;
  3. Calculate cumulative probability;
  4. Write binary numbers corresponding probability;
  5. Obtain a codeword (k bits from the former);

Coding examples

Fano coding

Encoding step

  1. Descending order according to the probability of source symbols;
  2. From this probability set in a specific position will be divided into two subsets, and as far as possible and the probability of two subsets of approximately equal, to the front of a subset of values ​​0, 1 is assigned a subset of the back;
  3. Repeating step (2) until each subset only one element in so far;
  4. The value of each element belongs to subset sequentially string together to obtain a codeword;

Coding examples

Huffman coding

Coding steps:

  1. Probability of source symbols by ordering large to small, the probability weight;
  2. Take minimal probability that a character left node, followed by the sign of the small right-node, and then the two (minimum) elements are added as a new element, and the weight is the probability, and the remaining new element reordering element;
  3. Repeating step (2), until only one element arrangements;
  4. Finally, the tree diagram is generated Huffman tree;
  5. 0 is the left node, right node 1, the path from the root node to a child node that is the code word symbols;

Coding examples

Arithmetic coding

Encoding step

  1. Symbols according to the probability of large source to a small sorting;
  2. The [0,1] interval into a plurality of subintervals, each subinterval represents a character, each character of the available positions [L, H];
  3. Encoding from a [0,1] beginning, set low = 0, high = 1;
  4. 不断读入原始数据字符,计算并更新;
    \[ low = low+(high−low) \cdot L \\\\ high = low+(high−low) \cdot H \]
  5. 最后得到的区间[low,high]中任意一个小数以二进制形式输出即可得到码字;

编码示例

注意:将符号序列的累积概率写成二进位小数,取小数点 后L位,若后面有尾数,就进位到第L位。进位!!!
累积概率递推公式: P(S,ar) = P(S) + p(s)Pr
二元序列:S = 011
P(S0->0110) = P(S)
P(S1->0111) = P(S) + p(S)P1

例1:设二元无记忆信源S={0,1},p(0)=1/4,p(1)=3/4。S=11111100,对其做算术编码。
解析:
P0 = 0,P1 = 1/4;P(S0) = P(S) + p(S)P0 = P(S); P(S1) = P(S) + p(S)P1
(此处P1=p(0),补充:二元信源P0 = 0,P1=p(0))

例2:设无记忆信源U={a1,a2,a3,a4},其概率分布依次为0.5,0.25,0.125,0.125,对信源序列做算术编码。
解析:
P1 = 0,P2 = 1/2,P3 = 3/4,P4 = 7/8;P(S,ar) = P(S) + p(s)Pr

LZ编码(LZ78码)

设信源符号集A={a1,a2,…,aK}共K个符号,设输入信源符号序列为u=(u1,u2,…,uL)

编码步骤

  1. 取第一个符号作为第一段;
  2. 若下一个符号与已有的分段都不重复,则取其为下一个分段;
  3. 若下一个符号与已有的分段都重复,则再取其下一位符号组成一个分段,直到同前面的所有分段不同;
  4. 重复(2)(3),直至信源序列结束;

编码示例

设U={a1,a2,a3,a4},信源序列为a1,a3,a2,a3,a2,a4,a3,a2,a1,a4,a3,a2,a1,共13位,字典如表所示:

参考链接:
https://baike.baidu.com/item/香农编码/22353186
https://zh.wikipedia.org/wiki/香农-范诺编码
https://blog.csdn.net/FX677588/article/details/70767446
https://zh.wikipedia.org/wiki/霍夫曼编码
https://segmentfault.com/a/1190000011561822
https://bkso.baidu.com/item/LZ编码

Guess you like

Origin www.cnblogs.com/xzhezhe/p/12152492.html