Data structure - Huffman tree (Python implementation)

Well, earlier we introduce the general binary tree complete binary tree, full binary tree, this article does, we want to introduce Huffman tree.
Huffman tree is also called optimal binary tree, Huffman tree associated with the concept as well as Huffman coding, both of which are virtually the same. Huffman coding is Huffman proposed in 1952. Now Huffman coding multi-application in terms of text compression. Next, we have to introduce the Huffman tree in the end is what? What Huffman coding is and how it can be applied to text compression.

Huffman (Huffman Tree)

Given n weights as a leaf node n, a binary tree configuration, if the weighted path length of the tree reaches a minimum, such a binary tree is called optimal binary tree, also known as Huffman tree (Huffman Tree). Huffman tree is the shortest path length weighted tree, the larger the weight from the root node closer.

First, we have some of this data:

sourceData = [('a', 8), ('b', 5), ('c', 3), ('d', 3), ('e', 8), ('f', 6), ('g', 2), ('h', 5), ('i', 9), ('j', 5), ('k', 7), ('l', 5), ('m', 10), ('n', 9)]

Each data item is a tuple, the tuple of the first content data, the second data is the right weight. In other words, the data used to build the Huffman tree is with weights. Suppose the data inside the letters of an alphabet of these weights are based on the probability of occurrence of a text y calculated, the higher the probability of occurrences of the letter, the larger the weight of the weight of the letter. Such as the right of the letter a weight of 8.

Well, since we can get the data to build the Huffman tree.

  1. First, find the smallest weight of all elements of the two elements, i.e., g (2) and c (3),
  2. Right to the parent node of the binary tree is a binary tree constructed g c and the child is constructed weight of 2 + 3 = 5.
  3. And g c in addition to the remaining elements and newly constructed right node 5 weight smallest weight selected two nodes,
  4. 2 for the first-step operation.

And so on, until the final synthesis of a Huffman binary tree is a tree.

We use it to represent the legend:

Well, here we Huffman tree is constructed well, node numbers indicate the right following the letter of the alphabet is heavy, is in front of the given data. Here I want to emphasize that, Huffman tree is not the same data to create a unique, step by step, just follow the rules no mistake, your Huffman tree is correct.

We will now access the left node is defined as 0, access is defined as the right node 1. We now access the letters a, it is encoded as 0110, the letter n access code 111, the coding is Huffman coding.

By comparing Huffman coding for different letters, what have you found?

权重越大的字母对应的哈夫曼编码越短,权重越小的字母对应的哈夫曼编码则越长。也就是说文本中出现概率大的字母编码短,出现概率小的字母编码长。通过这种编码方式来表示文本中的字母,那所得整个文本的编码长度也会缩短。

这就是哈夫曼树也就是哈夫曼编码在文本压缩中的应用。

下面我们用代码来实现:

定义一个二叉树类:

class BinaryTree:
    def __init__(self, data, weight):
        self.data = data
        self.weight = weight
        self.left = None
        self.right = None

获取节点列表中权重最小的两个节点:

# 定义获取列表中权重最大的两个节点的方法:
def min2(li):
    result = [BinaryTree(None, float('inf')), BinaryTree(None, float('inf'))]
    li2 = []
    for i in range(len(li)):
        if li[i].weight < result[0].weight:
            if result[1].weight != float('inf'):
                li2.append(result[1])
            result[0], result[1] = li[i], result[0]
        elif li[i].weight < result[1].weight:
            if result[1].weight != float('inf'):
                li2.append(result[1])
            result[1] = li[i]
        else:
            li2.append(li[i])
    return result, li2

定义生成哈夫曼树的方法:

def makeHuffman(source):
    m2, data = min2(source)
    print(m2[0].data, m2[1].data)
    left = m2[0]
    right = m2[1]

    sumLR = left.weight + right.weight
    father = BinaryTree(None, sumLR)
    father.left = left
    father.right = right
    if data == []:
        return father
    data.append(father)
    return makeHuffman(data)

定义广度优先遍历方法:

# 递归方式实现广度优先遍历
def breadthFirst(gen, index=0, nextGen=[], result=[]):

    if type(gen) == BinaryTree:
        gen = [gen]
    result.append((gen[index].data, gen[index].weight))
    if gen[index].left != None:
        nextGen.append(gen[index].left)
    if gen[index].right != None:
        nextGen.append(gen[index].right)

    if index == len(gen)-1:
        if nextGen == []:
            return
        else:
            gen = nextGen
            nextGen = []
            index = 0
    else:
        index += 1
    breadthFirst(gen, index, nextGen,result)

    return result

输入数据:

# 某篇文章中部分字母根据出现的概率规定权重
sourceData = [('a', 8), ('b', 5), ('c', 3), ('d', 3), ('e', 8), ('f', 6), ('g', 2), ('h', 5), ('i', 9), ('j', 5), ('k', 7), ('l', 5), ('m', 10), ('n', 9)]
sourceData = [BinaryTree(x[0], x[1]) for x in sourceData]

创建哈夫曼树并进行广度优先遍历:

huffman = makeHuffman(sourceData)
print(breadthFirst(huffman))

OK ,我们的哈夫曼树就介绍到这里了,你还有什么不懂的问题记得留言给我哦。

Guess you like

Origin www.cnblogs.com/dongyangblog/p/11228930.html