Data structure - Huffman coding and Huffman tree

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/starter_____/article/details/90692568

basic concept

In all binary weighted path length called optimal binary Huffman tree, which is constituted by n leaf nodes WPL weighted shortest binary tree.

  • Path: refers to a branch from a junction node between the other sequence.

  • Path length: refers to the number from one node to another branch node elapsed.

  • Tree path length: the tree and all leaf nodes of the path length.

  • Right: to each node of the tree gives a real number with some real meaning, we say that the real number is the right node.

  • Weighted Path Length: In the tree structure, the product of our right to a path length from the root node to the node, is called the weighted path length of the node.

  • Weighted tree path length: length of the path and the weighted sum of all the leaves of the tree nodes.

Here Insert Picture Description
As shown, the weight of leaf nodes A, B, C, D, respectively, 7,5,2,4, since the number of branches to the root node is A 2, the path length of A 2, A is weighted path length of 5 * 2 = 10, 3 = 6,4 * 2 * 2 = 8, so that the path length weighted binary tree is WPL = 14 + 10 + 6 + 8 = 38


Huffman tree constructor

Given n weights, weights with the n Huffman tree is constructed algorithm is described as follows:

(1) these n weight values ​​are regarded as the root node only n binary trees, a binary tree configuration of the set referred to as F

(2) select two root minimum weight in a binary tree in the forest F, a new binary tree as the left, right subtree, the root node is the right to mark the new binary tree root weights of the left and right subtrees and value.

(3) delete selected from F in that two binary trees, while the newly formed binary tree added to the forest F.

(4) repeat (2), (3) operation, until the date a forest contains only binary, binary tree obtained at this time is the Huffman tree.


Huffman tree configuration example

Here Insert Picture Description

(A) given four nodes a, b, c, d, respectively, 7,5,2,4 weight, as it is only the root of the binary tree 4

(B) the right values ​​of the smallest root of two, i.e. right value of 2 and 4 two root search appear. Construction of a new binary tree, the root weight is 4 + 2 = 6, the value in the right while the original 2 and 4 deleted, a new weight is added 6

(C) the root node of the right two values ​​to find the smallest root of two, i.e. with a weight of 5 and 6 appear. Construction of a new binary tree, the root weight was 6 + 5 = 11, while the original value of the right to delete 5 and 6, the addition of the new weights 11

(D) the root node of the right two values ​​the smallest root of two, i.e. right value of 7 and 11 appear to find. Construction of a new binary tree, the root weight is 11 + 7 = 18, while the original values ​​of the right 7 and 11 deleted, the new weight 18 is added, only the next tree in the set F, get Huffman tree.

Note: For the same set of nodes, the Huffman tree structure may not be unique, for example: if a, b, c, d weights of the four nodes of 3,2,2,2, respectively, selected minimum weight junction there are two different options, such as b and c or b and d


Huffman coding

Huffman coding is built on the basis of the Huffman tree, the biggest advantage of this approach is encoded with the least character the most informative content. According to the content of information sent, by counting the number of characters in the same text as the weight of each character, the establishment of Huffman tree.

  • Prefix encoding: encoding can not be a prefix of any other arbitrary encoding
  • For each sub-tree in the unified provisions of its left child labeled 0, 1 mark right child.
  • Which character to use when, starting from the root node Huffman tree, in order to write tag via a node, the end result is the node Huffman coding.
  • The more the number of text characters that appear, reflected in the Huffman tree is closer to the roots. Shorter length coding.

Here Insert Picture Description

As shown, a maximum number of characters used, followed by a character b. In the Huffman coding is a character 0, character encoding 10 b, c, encoded as a character 110, the character code 111 d.


N Huffman tree

The following ternary tree Huffman Example
Here Insert Picture Description
(1) sequence can not be directly constructed trigeminal Huffman tree nodes need added something right value 0 h
Here Insert Picture Description

(2) modeled on the above-described Huffman binary tree node selected merge, eventually give

Here Insert Picture Description

Guess you like

Origin blog.csdn.net/starter_____/article/details/90692568