The principle and construction method of Huffman tree

Table of contents

1. What is a Huffman tree

2. Why is there a Huffman tree

3. The principle of Huffman tree 

3.1 Construction method of Huffman tree

 3.2 Huffman decoding

 3.3 Several definitions

4. Characteristics of Huffman binary tree

5. Code about Huffman tree


1. What is a Huffman tree

  • The Huffman tree solves the coding problem. Given N weights as N leaf nodes , a binary tree is constructed. If the weighted path length of the tree reaches the minimum, such a binary tree is called an optimal binary tree, also known as is a Huffman Tree. The Huffman tree is a tree with the shortest weighted path length, and nodes with larger weights are closer to the root. To put it bluntly, it is to find the minimum binary code required to store a string of characters.

2. Why is there a Huffman tree

  • As shown in the figure below: the code we save in T(S) is relatively long and cumbersome. Is there a more concise way?

3. The principle of Huffman tree 

3.1 Construction method of Huffman tree

  • The following method can be used: first mark the number of occurrences of each element

  • The first step: Find the two smallest characters, the smaller one is on the left, the larger one is on the right, and form a binary tree. Delete the two numbers found this time in the frequency table, and add the frequency sum of the smallest two numbers this time, and the following E and D are the smallest to form a binary tree.

  •  Step 2 : By analogy, find the two smallest characters, the smaller one is on the left, and the larger one is on the right, forming a binary tree.

 Step 3: We mark each left branch as 0 and each right branch as 1

  • Step 4: The binary code of each character is (from the root node number to the corresponding leaf node, the value on the path is spliced ​​together to be the corresponding code of the leaf node letter) 

  •  Is the generated H(S) much shorter than T(S) now?

 3.2 Huffman decoding

  • Scan the binary tree from left to right, and output the original value when the leaf node is reached.

 Question: Will there be long-coded letters that do not conflict with short-coded letters: The answer is no, because scanning the binary tree scans the leaf node every time, and there will be no return phenomenon.

 3.3 Several definitions

  • Path: A path is a route formed by branches from one node to another in a tree.
  • Tree path length: The path length of a tree is the sum of the path lengths from the root to each node.
  • Weighted path length: A node has a weight, and the path length from the node to the root multiplied by the weight of the node is the weighted path length of the node. For example: E's weighted path length = 4x2 = 8
  • Weighted path length (WPL) of a tree: The weighted path length (WPL) of a tree refers to the sum of the weighted path lengths of all leaf nodes in the tree. For example: WPL =1x5 + 3x2 + 2x3 +2x4 + 1x4 =29

4. Characteristics of Huffman binary tree

  • Nodes with larger weights are closer to the root node.
  • There are no nodes with degree 1 in the tree. This kind of tree is also called regular strict) binary tree.
  • The weighted path length of the tree is the shortest.

5. Code about Huffman tree

typedef char **HuffmanCode;

//生成哈夫曼编码
void HuffCoding(HuffmanTree& HT, HuffmanCode& HC, int n)
{
	HC = (HuffmanCode)malloc(sizeof(char*)*(n + 1)); //开n+1个空间,因为下标为0的空间不用
	char* code = (char*)malloc(sizeof(char)*n); //辅助空间,编码最长为n(最长时,前n-1个用于存储数据,最后1个用于存放'\0')
	code[n - 1] = '\0'; //辅助空间最后一个位置为'\0'
	for (int i = 1; i <= n; i++)
	{
		int start = n - 1; //每次生成数据的哈夫曼编码之前,先将start指针指向'\0'
		int c = i; //正在进行的第i个数据的编码
		int p = HT[c].parent; //找到该数据的父结点
		while (p) //直到父结点为0,即父结点为根结点时,停止
		{
			if (HT[p].lc == c) //如果该结点是其父结点的左孩子,则编码为0,否则为1
				code[--start] = '0';
			else
				code[--start] = '1';
			c = p; //继续往上进行编码
			p = HT[c].parent; //c的父结点
		}
		HC[i] = (char*)malloc(sizeof(char)*(n - start)); //开辟用于存储编码的内存空间
		strcpy(HC[i], &code[start]); //将编码拷贝到字符指针数组中的相应位置
	}
	free(code); //释放辅助空间
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef double DataType; //结点权值的数据类型

typedef struct HTNode //单个结点的信息
{
	DataType weight; //权值
	int parent; //父节点
	int lc, rc; //左右孩子
}*HuffmanTree;

typedef char **HuffmanCode; //字符指针数组中存储的元素类型

//在下标为1到i-1的范围找到权值最小的两个值的下标,其中s1的权值小于s2的权值
void Select(HuffmanTree& HT, int n, int& s1, int& s2)
{
	int min;
	//找第一个最小值
	for (int i = 1; i <= n; i++)
	{
		if (HT[i].parent == 0)
		{
			min = i;
			break;
		}
	}
	for (int i = min + 1; i <= n; i++)
	{
		if (HT[i].parent == 0 && HT[i].weight < HT[min].weight)
			min = i;
	}
	s1 = min; //第一个最小值给s1
	//找第二个最小值
	for (int i = 1; i <= n; i++)
	{
		if (HT[i].parent == 0 && i != s1)
		{
			min = i;
			break;
		}
	}
	for (int i = min + 1; i <= n; i++)
	{
		if (HT[i].parent == 0 && HT[i].weight < HT[min].weight&&i != s1)
			min = i;
	}
	s2 = min; //第二个最小值给s2
}

//构建哈夫曼树
void CreateHuff(HuffmanTree& HT, DataType* w, int n)
{
	int m = 2 * n - 1; //哈夫曼树总结点数
	HT = (HuffmanTree)calloc(m + 1, sizeof(HTNode)); //开m+1个HTNode,因为下标为0的HTNode不存储数据
	for (int i = 1; i <= n; i++)
	{
		HT[i].weight = w[i - 1]; //赋权值给n个叶子结点
	}
	for (int i = n + 1; i <= m; i++) //构建哈夫曼树
	{
		//选择权值最小的s1和s2,生成它们的父结点
		int s1, s2;
		Select(HT, i - 1, s1, s2); //在下标为1到i-1的范围找到权值最小的两个值的下标,其中s1的权值小于s2的权值
		HT[i].weight = HT[s1].weight + HT[s2].weight; //i的权重是s1和s2的权重之和
		HT[s1].parent = i; //s1的父亲是i
		HT[s2].parent = i; //s2的父亲是i
		HT[i].lc = s1; //左孩子是s1
		HT[i].rc = s2; //右孩子是s2
	}
	//打印哈夫曼树中各结点之间的关系
	printf("哈夫曼树为:>\n");
	printf("下标   权值     父结点   左孩子   右孩子\n");
	printf("0                                  \n");
	for (int i = 1; i <= m; i++)
	{
		printf("%-4d   %-6.2lf   %-6d   %-6d   %-6d\n", i, HT[i].weight, HT[i].parent, HT[i].lc, HT[i].rc);
	}
	printf("\n");
}

//生成哈夫曼编码
void HuffCoding(HuffmanTree& HT, HuffmanCode& HC, int n)
{
	HC = (HuffmanCode)malloc(sizeof(char*)*(n + 1)); //开n+1个空间,因为下标为0的空间不用
	char* code = (char*)malloc(sizeof(char)*n); //辅助空间,编码最长为n(最长时,前n-1个用于存储数据,最后1个用于存放'\0')
	code[n - 1] = '\0'; //辅助空间最后一个位置为'\0'
	for (int i = 1; i <= n; i++)
	{
		int start = n - 1; //每次生成数据的哈夫曼编码之前,先将start指针指向'\0'
		int c = i; //正在进行的第i个数据的编码
		int p = HT[c].parent; //找到该数据的父结点
		while (p) //直到父结点为0,即父结点为根结点时,停止
		{
			if (HT[p].lc == c) //如果该结点是其父结点的左孩子,则编码为0,否则为1
				code[--start] = '0';
			else
				code[--start] = '1';
			c = p; //继续往上进行编码
			p = HT[c].parent; //c的父结点
		}
		HC[i] = (char*)malloc(sizeof(char)*(n - start)); //开辟用于存储编码的内存空间
		strcpy(HC[i], &code[start]); //将编码拷贝到字符指针数组中的相应位置
	}
	free(code); //释放辅助空间
}

//主函数
int main()
{
	int n = 0;
	printf("请输入数据个数:>");
	scanf("%d", &n);
	DataType* w = (DataType*)malloc(sizeof(DataType)*n);
	if (w == NULL)
	{
		printf("malloc fail\n");
		exit(-1);
	}
	printf("请输入数据:>");
	for (int i = 0; i < n; i++)
	{
		scanf("%lf", &w[i]);
	}
	HuffmanTree HT;
	CreateHuff(HT, w, n); //构建哈夫曼树

	HuffmanCode HC;
	HuffCoding(HT, HC, n); //构建哈夫曼编码

	for (int i = 1; i <= n; i++) //打印哈夫曼编码
	{
		printf("数据%.2lf的编码为:%s\n", HT[i].weight, HC[i]);
	}
	free(w);
	return 0;
}

Guess you like

Origin blog.csdn.net/weixin_43313333/article/details/131453169