LZ78编码算法

LZ78算法的压缩过程非常简单。在压缩时维护一个动态词典Dictionary,其包括了历史字符串的index与内容;压缩情况分为三种:

  1. 若当前字符c未出现在词典中,则编码为(0, c)
  2. 若当前字符c出现在词典中,则与词典做最长匹配,然后编码为(prefixIndex,lastChar),其中,prefixIndex为最长匹配的前缀字符串,lastChar为最长匹配后的第一个字符;
  3. 为对最后一个字符的特殊处理,编码为(prefixIndex,)

如果对于上述压缩的过程稍感费解,下面给出三个例子。例子一,对于字符串“ABBCBCABABCAABCAAB”压缩编码过程如下:


1. A is not in the Dictionary; insert it

2. B is not in the Dictionary; insert it

3. B is in the Dictionary. BC is not in the Dictionary; insert it.

4. B is in the Dictionary. BC is in the Dictionary. BCA is not in the Dictionary; insert it.

5. B is in the Dictionary. BA is not in the Dictionary; insert it.

6. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is not in the Dictionary; insert it.

7. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is in the Dictionary. BCAAB is not in the Dictionary; insert it.


例子二,对于字符串“BABAABRRRA”压缩编码过程如下:


1. B is not in the Dictionary; insert it

2. A is not in the Dictionary; insert it

3. B is in the Dictionary. BA is not in the Dictionary; insert it.

4. A is in the Dictionary. AB is not in the Dictionary; insert it.

5. R is not in the Dictionary; insert it.

6. R is in the Dictionary. RR is not in the Dictionary; insert it.

7. A is in the Dictionary and it is the last input character; output a pair containing its index: (2, )


如何进行解压:

解压缩能更根据压缩编码恢复出(压缩时的)动态词典,然后根据index拼接成解码后的字符串。为了便于理解,我们拿上述例子一中的压缩编码序列(0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)来分解解压缩步骤,如下图所示:

前后拼接后,解压缩出来的字符串为“ABBCBCABABCAABCAAB”。

猜你喜欢

转载自blog.csdn.net/qq_41989372/article/details/84921845
78
今日推荐