table of Contents
Newer and more comprehensive "data structures and algorithms," the update site, more python, go, waiting for you teaching artificial intelligence: https://www.cnblogs.com/nickchen121/p/11407287.html
First, what is Huffman (Huffman Tree)
If we percentile convert test scores into a five-point scale of achievement, we can use the program as shown below:
/* c语言实现 */
if( score < 60 ) grade =1;
else if( score < 70 ) grade =2;
else if( score < 80 ) grade =3;
else if( score < 90 ) grade =4;
else grade =5;
The above code, we can construct the decision tree shown in the following figure:
If the results of the above-mentioned five-point scale, we consider student achievement probability distribution , as shown below:
Probability by Student Records Distribution and said decision tree, we can get student performance search efficiency is:
\ [0.05 ×. 1 + 0.15 × 2 + 0.4 ×. 3 + 0.3 ×. 4 + 0.1 ×. 4 = 3.15 \]
from student performance probability distribution, we can see more 70-79 and 80-89 in the distribution of students, however, find their efficiency is really low, so we can follow the following ways to modify the code and decision tree :
/* c语言实现 */
if( score < 80 )
{
if( score < 70 );
if( score < 60 ) {
grade =1;
} else grade = 2;
else grad=3;
}
else if( score < 90 ) grade =4;
else grade =5;
With this modification, the student performance search efficiency is:
\ [× 0.05 + 0.15 ×. 3. 3 + 0.4 × 2 + 0.3 × 2 + 0.1 × 2 = 2.2 \]
Through the above example, we can think about a problem: how the Find a different node frequency to construct a more efficient search tree?
1.1 Definitions Huffman tree
Weighted path length (WPL): provided binary tree has n number of leaf nodes , each leaf node with the right values \ (W_k \) , the length from the root to each leaf node is \ (L_K \) , each leaf node of the weighted path length is the sum of: \ (] WPL = \ sum_ K = {}. 1 nw_kl_k ^ \)
Optimal binary or Huffman tree: WPL smallest binary tree
Example: There are five leaf nodes, their weight is {1, 2, 3, 4, 5}, with a sequence of values of this plurality of binary weights may be constructed of different shapes.
Second, the Huffman tree structure
Every time the minimum weight of two binary merger
/* c语言实现 */
typedef struct TreeNode *HuffmanTree;
struct TreeNode{
int Weight;
HuffmanTree Left, Right;
}
HuffmanTree Huffman( MinHeap H )
{
// 假设H->Size个权值已经存在H->Elements[]->Weight里
int i; HuffmanTree T;
BuildMinHeap(H); // 将H->Elements[]按权值调整为最小堆
for (i = 1; i < H->Size; i++)
{
// 做H->Size-1次合并
T = malloc(sizeof(struct TreeNode)); // 建立新结点
T->Left = DeleteMin(H); // 从最小堆中删除一个结点,作为新T的左子结点
T->Right = DeleteMin(H); // 从最小堆中删除一个结点,作为新T的右子结点
T->Weight = T->Left->Weight+T->Right->Weight; // 计算新权值
Insert(H, T); // 将新T插入最小堆
}
T = DeleteMin(H);
return T;
}
# python语言实现
# 节点类
class Node(object):
def __init__(self, name=None, value=None):
self._name = name
self._value = value
self._left = None
self._right = None
# 哈夫曼树类
class HuffmanTree(object):
# 根据Huffman树的思想:以叶子节点为基础,反向建立Huffman树
def __init__(self, char_weights):
self.a = [Node(part[0], part[1]) for part in char_weights] # 根据输入的字符及其频数生成叶子节点
while len(self.a) != 1:
self.a.sort(key=lambda node: node._value, reverse=True)
c = Node(value=(self.a[-1]._value + self.a[-2]._value))
c._left = self.a.pop(-1)
c._right = self.a.pop(-1)
self.a.append(c)
self.root = self.a[0]
self.b = list(range(10)) # self.b用于保存每个叶子节点的Haffuman编码,range的值只需要不小于树的深度就行
# 用递归的思想生成编码
def pre(self, tree, length):
node = tree
if (not node):
return
elif node._name:
print(node._name + '的编码为:')
for i in range(length):
print(self.b[i])
print()
return
self.b[length] = 0
self.pre(node._left, length + 1)
self.b[length] = 1
self.pre(node._right, length + 1)
# 生成哈夫曼编码
def get_code(self):
self.pre(self.root, 0)
if __name__ == '__main__':
# 输入的是字符及其频数
char_weights = [('a', 5), ('b', 4), ('c', 10), ('d', 8), ('f', 15), ('g', 2)]
tree = HuffmanTree(char_weights)
tree.get_code()
The time complexity of the above process is: O (logN N)
2.1 Huffman tree features
- No degree of node 1;
- A total of 2n-1 Huffman tree nodes leaf node of the n
- Any non-leaf node of the Huffman tree of the left and right subtrees exchange after still Huffman tree;
- The same set of weights \ ({W_1, W_2, \ cdots, w_n} \) , the existence of different configurations of two Huffman tree it?
- {1, 2, 3, 3}, as shown below may have a set of weights to a different configuration of two Huffman tree:
Third, Huffman coding
Given a string, how to encode characters can be encoded so that the string of storage space at least ?
例:假设有一段文本,包含58个字符,并由以下7个字符构成:a,e,i,s,t,空格(sp),换行(nl)。这7个字符出现的次数不同。如何对这7个字符进行编码,使得总编码空间最少?
分析:
- 用等长ASCCII编码:58*8 = 464位;
- 用等长3位编码:58*3 = 174位;
- 不等长编码:出现频率高的字符用的编码短些,出现频率低的字符可以编码长些?
对于上述问题,我们如果使用下图所示方式进行不等长编码:
很明显,可以发现上图所示的不等长编码具有二义性,因此我们可以使用前缀码的方式解决二义性问题。
前缀码(prefix code):任何字符的编码都不是另一字符编码的前缀。
3.1 使用二叉树编码
使用二叉树编码,我们需要注意以下两个问题:
- 左右分支:0、1
- 字符只在叶结点上
假设四个字符的频率为:a:4,u:1,x:2,z:1;那么我们可以使用最普通的二叉树对这四个字符进行编码,如下图所示:
通过上图可以发现,我们使用最偷懒的方式,把四个字符放在上述二叉树的四个叶子结点上,因此我们可以考虑使用如下所示的方法,降低二叉树的编码代价:
综上:哈夫曼编码需要解决的一个问题为——如何构造一颗编码代价最小的二叉树。
3.2 使用哈夫曼树编码
对于我们提出来的7个字符的例子,如果我们得知这7个字符的分布概率为如下图所示:
我们可以使用构造哈夫曼树的方式,进行哈夫曼编码,编码流程如下:
最终结果如下图所示: