H (H) Coding

H (H) Coding Principle

Huffman coding is a variable word length coding with different prefixes, which constructs the codeword with the shortest average length according to different probabilities of occurrence. The basic method of encoding is to scan the source symbols first, count the probability of each symbol, and then assign codewords of different lengths according to the probability, so as to construct a coding table so that the average length of the source symbols the shortest.

For example, if the source symbols have u1, u2, u3, the corresponding probabilities are P1=0.2, P2=0.2, P3=0.6.

When encoding, firstly sort the three symbols according to the probability from small to large, starting from the two minimum probability source symbols, select one of the branches as 0, and the other branch as 1, and then The encoded probabilities of the two branches are merged and requeued.

The above process is repeated until the combined probabilities add up to 1. Finally, the 0 and 1 encountered on the branch are sorted in reverse order, and the obtained code is the Huffman code word of the symbol.

As shown in the figure below, the Huffman codeword of u2 is "01". Huffman coding records the codeword of each symbol, and the corresponding relationship between the codeword and the source symbol is recorded as a code table, as shown in Table 1 .

insert image description here
Figure 1. The original Huffman code

However, the result obtained by Huffman encoding is not unique. In probability statistics, there may be situations where the probability of two source symbols is equal, resulting in a non-unique method of queuing. Another reason is that in the process of encoding and marking , 0, 1 branch selection is not fixed, it will make it possible to have different encoding results. But in general, characters with high occurrence probability will be assigned shorter codewords, and longer codewords will be assigned to characters with low occurrence probability, which ensures that codewords are allocated according to probability and makes the average code length the shortest, reaching The purpose of losslessly compressing data.

Table 1. Huffman coding table:
insert image description here

Guess you like

Origin blog.csdn.net/q15516221118/article/details/131357887
h'h