First, what is Huffman (Huffman Tree)

If we percentile convert test scores into a five-point scale of achievement, we can use the program as shown below:

/* c语言实现 */

if( score < 60 )  grade =1;
else if( score < 70 ) grade =2; 
else if( score < 80 ) grade =3; 
else if( score < 90 ) grade =4;
else grade =5;

The above code, we can construct the decision tree shown in the following figure:

If the results of the above-mentioned five-point scale, we consider student achievement probability distribution , as shown below:

Probability by Student Records Distribution and said decision tree, we can get student performance search efficiency is:
\ [0.05 ×. 1 + 0.15 × 2 + 0.4 ×. 3 + 0.3 ×. 4 + 0.1 ×. 4 = 3.15 \]
from student performance probability distribution, we can see more 70-79 and 80-89 in the distribution of students, however, find their efficiency is really low, so we can follow the following ways to modify the code and decision tree :

/* c语言实现 */

if( score < 80 )   
{      
  if( score < 70 );
   if( score < 60 ) {
     grade =1; 
   } else grade = 2; 
  else grad=3; 
}
else if( score < 90 ) grade =4; 
else grade =5;

With this modification, the student performance search efficiency is:
\ [× 0.05 + 0.15 ×. 3. 3 + 0.4 × 2 + 0.3 × 2 + 0.1 × 2 = 2.2 \]
Through the above example, we can think about a problem: how the Find a different node frequency to construct a more efficient search tree?

1.1 Definitions Huffman tree

Weighted path length (WPL): provided binary tree has n number of leaf nodes , each leaf node with the right values \ (W_k \) , the length from the root to each leaf node is \ (L_K \) , each leaf node of the weighted path length is the sum of: \ (] WPL = \ sum_ K = {}. 1 nw_kl_k ^ \)

Optimal binary or Huffman tree: WPL smallest binary tree

Example: There are five leaf nodes, their weight is {1, 2, 3, 4, 5}, with a sequence of values of this plurality of binary weights may be constructed of different shapes.

Second, the Huffman tree structure

Every time the minimum weight of two binary merger

/* c语言实现 */

typedef struct TreeNode *HuffmanTree;
struct TreeNode{
  int Weight;
  HuffmanTree Left, Right;
}

HuffmanTree Huffman( MinHeap H )
{
  // 假设H->Size个权值已经存在H->Elements[]->Weight里
  int i; HuffmanTree T;
  BuildMinHeap(H); // 将H->Elements[]按权值调整为最小堆
  for (i = 1; i < H->Size; i++)
  {
    // 做H->Size-1次合并
    T = malloc(sizeof(struct TreeNode)); // 建立新结点
    T->Left = DeleteMin(H); // 从最小堆中删除一个结点，作为新T的左子结点
    T->Right = DeleteMin(H); // 从最小堆中删除一个结点，作为新T的右子结点
    T->Weight = T->Left->Weight+T->Right->Weight; // 计算新权值
    Insert(H, T); // 将新T插入最小堆
  }
  T = DeleteMin(H);
  return T;
}

# python语言实现

# 节点类
class Node(object):
    def __init__(self, name=None, value=None):
        self._name = name
        self._value = value
        self._left = None
        self._right = None


# 哈夫曼树类
class HuffmanTree(object):

    # 根据Huffman树的思想：以叶子节点为基础，反向建立Huffman树
    def __init__(self, char_weights):
        self.a = [Node(part[0], part[1]) for part in char_weights]  # 根据输入的字符及其频数生成叶子节点
        while len(self.a) != 1:
            self.a.sort(key=lambda node: node._value, reverse=True)
            c = Node(value=(self.a[-1]._value + self.a[-2]._value))
            c._left = self.a.pop(-1)
            c._right = self.a.pop(-1)
            self.a.append(c)
        self.root = self.a[0]
        self.b = list(range(10))  # self.b用于保存每个叶子节点的Haffuman编码,range的值只需要不小于树的深度就行

    # 用递归的思想生成编码
    def pre(self, tree, length):
        node = tree
        if (not node):
            return
        elif node._name:
            print(node._name + '的编码为:')
            for i in range(length):
                print(self.b[i])
            print()
            return
        self.b[length] = 0
        self.pre(node._left, length + 1)
        self.b[length] = 1
        self.pre(node._right, length + 1)

    # 生成哈夫曼编码   
    def get_code(self):
        self.pre(self.root, 0)


if __name__ == '__main__':
    # 输入的是字符及其频数
    char_weights = [('a', 5), ('b', 4), ('c', 10), ('d', 8), ('f', 15), ('g', 2)]
    tree = HuffmanTree(char_weights)
    tree.get_code()

The time complexity of the above process is: O (logN N)

2.1 Huffman tree features

No degree of node 1;
A total of 2n-1 Huffman tree nodes leaf node of the n

Any non-leaf node of the Huffman tree of the left and right subtrees exchange after still Huffman tree;
The same set of weights \ ({W_1, W_2, \ cdots, w_n} \) , the existence of different configurations of two Huffman tree it?
- {1, 2, 3, 3}, as shown below may have a set of weights to a different configuration of two Huffman tree: