What is the prefix encoding? What is Huffman coding?

Prefix encoding:

If in a coding scheme, any code is not a prefix (leftmost substring) of any other code, then the code is called a prefix code.

How to judge the prefix code

As shown in the figure above, unequal length coding scheme 1 is prefix coding;

analyse as below:

In the unequal length coding scheme 1, a, b, c, d are four characters, and in the corresponding codes 0, 10, 110, 111, any code (choose randomly among the four) is not a prefix of any other code For example, if you choose the code 0 of a for comparison, you will find that 10, 110, and 111 are not prefixed with 0 (neither start with 0). 

 

As shown in the figure above, unequal length coding scheme 2 is not prefix coding;

analyse as below:

Choose one of the four codes at random. If you choose 0, you will find that 01 and 010 are prefixed with 0, so they do not meet the definition of prefix codes. The analysis ends.

I don’t know if you are vaguely aware that equal length codes must be prefix codes!

For example, two-digit encoding, 00, 01, 10, 11 is, the same, three-digit, four-digit...n-digit.

Prefix encoding function:

Prefix encoding can ensure that there is no ambiguity when decoding compressed files and ensure correct decoding.

Public account Zhonglu Xiaoma for more information.

Huffman coding:

For a Huffman tree with n leaves, if 0 is assigned to each left branch and 1 is assigned to the right branch in the tree, then the path from the root to each leaf forms a binary string. , The binary string becomes the Huffman code.

The main idea of ​​Huffman coding:

During data compression, in order to make the compressed data file as short as possible, variable length encoding can be used.

The basic idea is:

Characters that do not appear frequently are coded with shorter codes. In order to ensure effective compression of data files and correct decoding of compressed files, the Huffman tree can be used to design binary codes.

The Huffman tree satisfies two properties:

Property 1 Huffman tree is prefix encoding.

Property 2 Huffman tree is the most prefixed code.

For a file containing n data characters, the Huffman tree is constructed with the number of occurrences of them as the weight, and then the Huffman code corresponding to the tree is used to encode the file, which can make the corresponding binary file after the file compression The shortest length.

 

Guess you like

Origin blog.csdn.net/qq_30787727/article/details/112210023