Python implements Huffman tree

Python implements Huffman tree

The Huffman tree is a special binary tree, a binary tree with the shortest weighted path length, also known as the optimal binary tree.

Given N weights as the weights of the N leaf nodes of the binary tree, a binary tree is constructed. If the weighted path length of the binary tree reaches the minimum, the binary tree is called a Huffman tree.

The nodes with larger weights in the Huffman tree are closer to the root.

Huffman trees are mainly used in the fields of information coding and data compression, and are the basis of modern compression algorithms.

1. Related terms of Hoffman tree

For Hoffman tree to meet the minimum length of weighted path, what is the weight? What is the path? What is the weighted path length?

1. Path

In a tree, the path from a node down to a child node or descendant node is called a path.

2. The weight of the node

In a specific application scenario, each node of the binary tree corresponds to a specific business meaning, and each node has a different weight. The weight value of the node is called the weight value of the node.

As shown in the figure below, the weight of node C is 5.

3. The path length of the node

The root node is at the first level of the binary tree, and the path length of the node at the Lth level in the binary tree is L-1.

As shown in the figure above, node F is at the third level of the binary tree, and the path length of F is 3-1 = 2.

The path length of a node can also be understood in this way. If each node's path to its child node is recorded as a path unit, then starting from the root node, the number of path units to a node is called the path length of the node.

4. The weighted path length of the node

The product of the node's weight and the node's path length is called the node's weighted path length.

In the above figure, the weight of node D is 18 and the path length is 2, then the weighted path length of node D is 18*2 = 36.

5. Weighted path length of the tree

The sum of the weighted path lengths of all leaf nodes of the tree is called the weighted path length of the tree, and is denoted as WPL (Weighted Path Length of Tree).

As shown in the figure above, the weighted path length of the binary tree is WPL = 18*2 + 7*2 + 6*2 + 17*2 = 96

Second, the construction process of the Hoffman tree

Given N weights as the weights of the N leaf nodes of the binary tree, a binary tree can be constructed to construct a variety of binary trees with different structures, and the weighted path lengths of binary trees with different structures are not necessarily the same. Only when the weighted path length of the binary tree is the smallest, the binary tree is a Huffman tree.

For example, given the four weights of 3, 5, 7, 13 as the weights of the leaf nodes, a binary tree with four leaf nodes can have many different structures. The following is an example of two of them, the left side of the binary tree The weighted path length is 3*3 + 5*2 + 7*2 + 13*2 = 59, and the weighted path length of the binary tree on the right is 13*1 + 7*2 + 3*3 + 5*3 = 51. According to the characteristics of the Huffman tree, the node with the larger weight is closer to the root, that is to say, the node with the larger weight has a shorter path. The binary tree on the right in the figure below is a Huffman tree, and the weight of the tree is The path length has reached a minimum.

So, how to construct a Huffman tree based on the given leaf node weights? Before constructing the Huffman tree, first derive some general properties of the Huffman tree.

1. To ensure that the constructed binary tree is a Huffman tree, it is necessary to make the path of the node with a large weight as short as possible. Conversely, the node with a small weight can only be at the higher level of the binary tree, and the short path is Give nodes with large weights. From a local point of view, as long as it is ensured that the path of each node is not greater than the node whose weight is smaller than it, the greedy algorithm can be used.

2. There will be no node with only one child node in the Huffman tree. Assuming that there is a node with only one child node in the Huffman tree, if the node is deleted, the path length of all leaf nodes in its subtree can be reduced by 1, and a binary tree with a smaller weighted path length can be constructed, which is the same as The definition of the Hoffman tree is contradictory, so the hypothesis does not hold.

3. If the Huffman tree has only two leaf nodes, the paths of the two leaf nodes are equal, and both are 1.

According to these properties, start to construct the Huffman tree, the steps are as follows:

1. Consider a tree with N leaf nodes as a forest of N trees (each tree has only one root node).

Take 3,5,7,13 as examples.

2. Select the two trees with the smallest root node weight from the forest, and use them as the left and right subtrees of the new tree (so that the new tree is constructed to satisfy the Huffman tree), and the root node weight of the new tree is its left and right subtrees The sum of the weights of the root node. Then the two merged trees are deleted from the forest, and the new tree is added to the forest.

Choose the smallest 3 and 5 from them, merge them into a Hoffman tree, and then add the new tree to the forest.

3. Repeat step 2 until there is only one tree left in the forest, and the last tree is the Hoffman tree.

In order to ensure that the structure of the Huffman tree is unique, the tree with a small root node weight is used as the left subtree and the tree with a large root node weight is used as the right subtree during each merging in this paper. This can be determined by yourself, because as long as the weighted path length of the tree reaches the minimum, no matter what the structure is, it is a Hoffman tree, and the Hoffman tree is not unique.

Continue to select the smallest 7 and 8 and merge them.

The final Hoffman tree structure is as follows.

Now verify that the weighted path length of the tree is WPL = 13*1 + 7*2 + 3*3 + 5*3 = 51. The larger the weight, the shorter the path of the node, so this is a Huffman tree .

Three, Python implements the Huffman tree

1. Code Preparation

# coding=utf-8
class Node(object):
    def __init__(self, data):
        self.data = data
        self.parent = None
        self.left_child = None
        self.right_child = None
        self.is_in_tree = False


class HuffmanTree(object):
    """霍夫曼树"""
    def __init__(self):
        self.__root = None
        self.prefix_branch = '├'
        self.prefix_trunk = '|'
        self.prefix_leaf = '└'
        self.prefix_empty = ''
        self.prefix_left = '─L─'
        self.prefix_right = '─R─'

    def is_empty(self):
        return not self.__root

    @property
    def root(self):
        return self.__root

    @root.setter
    def root(self, value):
        self.__root = value if isinstance(value, Node) else Node(value)

    def show_tree(self):
        if self.is_empty():
            print('空二叉树')
            return
        print('-' * 20)
        print(self.__root.data)
        self.__print_tree(self.__root)
        print('-' * 20)

    def __print_tree(self, node, prefix=None):
        if prefix is None:
            prefix, prefix_left_child = '', ''
        else:
            prefix = prefix.replace(self.prefix_branch, self.prefix_trunk)
            prefix = prefix.replace(self.prefix_leaf, self.prefix_empty)
            prefix_left_child = prefix.replace(self.prefix_leaf, self.prefix_empty)
        if self.has_child(node):
            if node.right_child is not None:
                print(prefix + self.prefix_branch + self.prefix_right + str(node.right_child.data))
                if self.has_child(node.right_child):
                    self.__print_tree(node.right_child, prefix + self.prefix_branch + ' ')
            else:
                print(prefix + self.prefix_branch + self.prefix_right)
            if node.left_child is not None:
                print(prefix + self.prefix_leaf + self.prefix_left + str(node.left_child.data))
                if self.has_child(node.left_child):
                    prefix_left_child += '  '
                    self.__print_tree(node.left_child, self.prefix_leaf + prefix_left_child)
            else:
                print(prefix + self.prefix_leaf + self.prefix_left)

    def has_child(self, node):
        return node.left_child is not None or node.right_child is not None

First create a node class Node, which is used to create the nodes of the Huffman tree. Here we should pay attention to it, because when constructing the Huffman tree, it is necessary to continuously select the two trees with the smallest root node from a forest to merge, so in the node Add a flag bit, is_in_tree, if it is True, it means that the tree has been merged into the Huffman tree and will not be fetched repeatedly.

To implement a Huffman tree class HuffmanTree in advance, first prepare a method show_tree() to print the Huffman tree in a tree structure.

According to the construction process of the Hoffman tree, the construction method of the Hoffman tree is realized.

    def huffman(self, leavers):
        """构造霍夫曼树"""
        if len(leavers) <= 0:
            return
        if len(leavers) == 1:
            self.root = Node(leavers[0])
            return
        woods = list()
        for i in range(len(leavers)):
            woods.append(Node(leavers[i]))
        while len(woods) < 2*len(leavers) - 1:
            node1, node2 = Node(float('inf')), Node(float('inf'))
            for j in range(len(woods)):
                if node1.data > node2.data:
                    node1, node2 = node2, node1
                if woods[j].data < node1.data and woods[j].is_in_tree is False:
                    node1, node2 = woods[j], node1
                elif node1.data <= woods[j].data < node2.data and woods[j].is_in_tree is False:
                    node2 = woods[j]
            parent_node = Node(node1.data + node2.data)
            woods.append(parent_node)
            parent_node.left_child, parent_node.right_child = node1, node2
            self.root, node1.parent, node2.parent = parent_node, parent_node, parent_node
            node1.is_in_tree, node2.is_in_tree = True, True

huffman(leavers): Construct a Huffman tree. When constructing the Huffman tree, N weights will be given, and if N<=0, return directly. If N=1, directly use this weight as the weight of the root node to construct a Huffman tree. If N>=2, first use these N weights as the weight of the root node to construct a forest containing N trees, and then select the two trees with the smallest root node weight from the forest to merge, and loop until only left a tree.

In this method, there are the following points to pay attention to, otherwise it is easy to make mistakes:

1. For the convenience of processing in the code, the merged tree is not deleted from the list woods (the deletion operation is very troublesome, especially when the weights are equal), but by modifying the flag bit is_in_tree of the root node, if is_in_tree is True, it means The tree has been merged and will not be fetched repeatedly.

So how to judge that the Huffman tree has been constructed? When the flag is_in_tree of the root node of only one tree in woods is False, but this is not easy to judge. After each merge, it is necessary to judge the woods. Root node flag bit. According to the properties analyzed above, using the greedy algorithm, each time the trees are merged, the new tree is a local Huffman tree, and there will be no nodes with only one child node in the Huffman tree. In the process of constructing the Huffman tree, each node is added to the forest woods as the root node of a tree, so the length of the woods is equal to the number of nodes of the Hoffman tree, when the length of the woods reaches the Hoffman When the total number of nodes in the tree, the Huffman tree is constructed.

In the Huffman tree, except for the leaf node, other nodes have two child nodes. According to the characteristics of the binary tree, when the number of leaf nodes is N, the number of nodes with two child nodes is N+1, so the Huo with N leaf nodes The number of nodes in the Fuman tree is 2*N + 1.

2. In order to obtain the two root nodes with the smallest weight, two variables node1 and node2 are declared in advance. These two variables can be assigned to two nodes with large weights in advance. It is recommended to directly use positive infinity float('inf '), if the first and second values ​​in woods are assigned at the beginning, when the first or second value is the minimum value, the same value may be fetched every time through the loop.

3. When looking for the two root node values ​​with the smallest weight, if the weight of the current node node is less than the weight of node1, node.data <node1.data, then node1 is assigned to node2, and node is assigned to node1, if The weight of the current node node is greater than or equal to the weight of node1 and less than the weight of node2, node1.data <= node.data <node2.data, then node is assigned to node2. It is easy to overlook the second case here.

if __name__ == '__main__':
    tree = HuffmanTree()
    leavers = [11, 5, 7, 13, 17, 11]
    tree.huffman(leavers)
    tree.show_tree()

operation result:

--------------------
64
├─R─39
| ├─R─22
| | ├─R─11
| | └─L─11
| └─L─17
└─L─25
  ├─R─13
  └─L─12
    ├─R─7
    └─L─5
--------------------

Given the weights of N leaf nodes are [11, 5, 7, 13, 17, 11], the Huffman tree structure is as shown in the figure below.

The weighted path length of the Huffman tree is WPL = 13*2 + 17*2 + 5*3 + 7*3 + 11*3 + 11*3 = 162.

 

 

Guess you like

Origin blog.csdn.net/weixin_43790276/article/details/105890968