Huffman tree and Huffman code explanation

Huffman tree and Huffman code explanation

1. Huffman tree explanation

Huffman tree (Huffman Tree), also known as optimal binary tree, is a kind of tree with the shortest weighted path length. When using n nodes (all as leaf nodes and each with their own weights), an attempt is made to construct a tree When tree, if the weighted path length of the tree constructed is the smallest, the tree is called the "optimal binary tree", sometimes also called the "Huffman tree" or "Huffman tree".

赫夫曼树,我举一个最简单的例子,也就是如果5个英文字母,
统计一下,分别出现次数 A:100 B:20 C:60 D:2 E:23,这五
个字母A出现最多,D最少,那么我们在使用过程中肯定用A最多,
所以就将A放在最接近跟节点的地方,最少的D放在最远的地方。
这样的话就大大节省了我们在取用过程花费的时间。

First understand a few Huffman tree nouns.
路径: In a tree, the path from one node to another is called a path. In the figure, the path from the root node to node a is a path.

路径长度: In a path, the length of the path must be increased by 1 each time a node is passed. For example, in a tree, the root node is specified as one layer, then the path length from the root node to the i-th layer node is i-1. The path length from the root node to node c in Figure 1 is 3.

结点的权: Assign a new value to each node, which is called the weight of this node. For example, in the figure, node a has a weight of 7, and node b has a weight of 5.

结点的带权路径长度: Refers to the product of the length of the path from the root node to the node and the weight of the node. As shown in the figure, the weighted path length of node b is 2 * 5 = 10.
Insert picture description here

2. How to generate a Huffman tree

For example, there are the following letters, the weights are:
A:90, B:50, C:40, D:100, E:14, F:7, G:2

  • The first step is to
    find the two with the smallest weights to form a binary tree
    Insert picture description here
  • Remove F and G, add 9, and repeat the first step.
    Now find the two smallest components among the remaining numbers A: 90, B: 50, C: 40, D: 100, E: 14 and 9 Binary tree.
    Insert picture description here
  • Repeat the above operation to find the smallest two binary trees among A:90, B:50, C:40, D:100, and 23.
    Insert picture description here
  • Repeat the above operation to find the two smallest binary trees in A:90, B:50, D:100 and 63.
    Insert picture description here
  • Repeat the above operation. Choose the two smallest ones among A:90, D:100 and 113 to form a binary tree.
    Insert picture description here
  • There are only two points now.
    Insert picture description here
    This is the generation process, which should be understood by everyone,
3. Code implementation
#include<iostream>
#define maxSize 100
#include <cstring>
using namespace std;
//节点的属性
typedef struct Node
{
    
    
	int weight;                //权值
	int parent;                //下面会首先初始化所有节点父节点为-1,代表还没有加入树中,如果后面对节点进行复制,父节点就不会等于-1了。
	int lchild, rchild;         //左右孩子节点的序号
}HTNode, *HuffmanTree;          //用来存储赫夫曼树中的所有节点
typedef char **HuffmanCode;    //用来存储每个叶子节点的赫夫曼编码

HuffmanTree create_HuffmanTree(int *wet, int n);
void select_minium(HuffmanTree HT, int k, int &min1, int &min2);
int min(HuffmanTree HT, int k);
void HuffmanCoding2(HuffmanTree HT, HuffmanCode &HC, int n);
int countWPL2(HuffmanTree HT, int n);
int main()
{
    
    
    int         n;
    int         num;
    cout << "please input how many num:";
    cin >> n;
    int        w[n];
    for(int i; i<n ;i++){
    
    
        cout << "please input " << i << ":";
        cin >> num;
        w[i] = num;
    }


	HuffmanCode HC = NULL;
	HuffmanTree hTree = create_HuffmanTree(w, n);


	int wpl2 = countWPL2(hTree, n);
	printf("从根结点开始遍历二叉树求最小带权路径长度WPL=%d\n", wpl2);


	printf("\n从根结点到叶子结点编码结果为:\n");
	HuffmanCoding2(hTree, HC, n);
	system("pause");
	return 0;
}

//创建树
HuffmanTree create_HuffmanTree(int *wet, int n)
{
    
    
	//一棵有n个叶子节点的赫夫曼树共有2n-1个节点
	int total = 2 * n - 1;
	HuffmanTree HT = (HuffmanTree)malloc(total * sizeof(HTNode));
	if (!HT)
	{
    
    
		printf("HuffmanTree malloc faild!");
		exit(-1);
	}
	int i;


    /*首先将所有节点的父节点赋值为-1.代表还没有加入树中
	HT[0],HT[1]...HT[n-1]中存放需要编码的n个叶子节点,叶子节点都是带权的*/
	
	for (i = 0; i < n; i++)
	{
    
    
		HT[i].parent = -1;
		HT[i].lchild = -1;
		HT[i].rchild = -1;
		HT[i].weight = *wet;
		wet++;
	}


	//初始化不带权值的节点,HT[n],HT[n+1]...HT[2n-2]中存放的是中间构造出的每棵二叉树的根节点
	for (; i < total; i++)
	{
    
    
		HT[i].parent = -1;
		HT[i].lchild = -1;
		HT[i].rchild = -1;
		HT[i].weight = 0;
	}

	int min1, min2;
    //用来保存每一轮选出的两个weight最小且parent为-1的节点
    //每一轮比较后选择出min1和min2构成一课二叉树,最后构成一棵赫夫曼树
	for (i = n; i < total; i++)
	{
    
    
	    //传入的参数,第一个是树指针,第二个是构造的节点的第一个数
		select_minium(HT, i, min1, min2);
		HT[min1].parent = i;
		HT[min2].parent = i;
		//这里左孩子和右孩子可以反过来,构成的也是一棵赫夫曼树,只是所得的编码不同
		HT[i].lchild = min1;
		HT[i].rchild = min2;
		HT[i].weight = HT[min1].weight + HT[min2].weight;
	}
	return HT;
}
/*
从HT数组的前k个元素中选出weight最小且parent为-1的两个,分别将其序号保存在min1和min2中
*/
void select_minium(HuffmanTree HT, int k, int &min1, int &min2)
{
    
    
	min1 = min(HT, k);
	min2 = min(HT, k);
}
/*
从HT数组的前k个元素中选出weight最小且parent为-1的元素,并将该元素的序号返回
*/
int min(HuffmanTree HT, int k)
{
    
    
	int i = 0;
	int min;        //用来存放weight最小且parent为-1的元素的序号
	int min_weight; //用来存放weight最小且parent为-1的元素的weight值

    //先将第一个parent为-1的元素的weight值赋给min_weight,留作以后比较用。
    //注意,这里不能按照一般的做法,先直接将HT[0].weight赋给min_weight,
    //因为如果HT[0].weight的值比较小,那么在第一次构造二叉树时就会被选走,
    //而后续的每一轮选择最小权值构造二叉树的比较还是先用HT[0].weight的值来进行判断,
    //这样又会再次将其选走,从而产生逻辑上的错误。
    //这里先找出还没有加入树中带权值的节点。
	while (HT[i].parent != -1)
		i++;
	min_weight = HT[i].weight; //找到之后将他的权值赋值给变量
	min = i;

	//选出weight最小且parent为-1的元素,并将其序号赋给min
	for (; i < k; i++)
	{
    
    
		if (HT[i].weight < min_weight && HT[i].parent == -1)
		{
    
    
			min_weight = HT[i].weight;
			min = i;
		}
	}

	//选出weight最小的元素后,将其parent置1,使得下一次比较时将其排除在外。
	HT[min].parent = 1;

	return min;
}

/*
从根节点到叶子节点无栈非递归遍历赫夫曼树HT,求其中n个叶子节点的赫夫曼编码,并保存在HC中
*/
void HuffmanCoding2(HuffmanTree HT, HuffmanCode &HC, int n)
{
    
    
	//用来保存指向每个赫夫曼编码串的指针
	HC = (HuffmanCode)malloc(n * sizeof(char *));
	if (!HC)
	{
    
    
		printf("HuffmanCode malloc faild!");
		exit(-1);
	}

	//临时空间,用来保存每次求得的赫夫曼编码串
	//对于有n个叶子节点的赫夫曼树,各叶子节点的编码长度最长不超过n-1
	//外加一个'\0'结束符,因此分配的数组长度最长为n即可
	char *code = (char *)malloc(n * sizeof(char));
	if (!code)
	{
    
    
		printf("code malloc faild!");
		exit(-1);
	}

	int cur = 2 * n - 2;    //当前遍历到的节点的序号,初始时为根节点序号
	int code_len = 0;   //定义编码的长度

//构建好赫夫曼树后,把weight用来当做遍历树时每个节点的状态标志
//weight=0表明当前节点的左右孩子都还没有被遍历
//weight=1表示当前节点的左孩子已经被遍历过,右孩子尚未被遍历
//weight=2表示当前节点的左右孩子均被遍历过
	int i;
	for (i = 0; i < cur + 1; i++)
	{
    
    
		HT[i].weight = 0;
	}

	//从根节点开始遍历,最后回到根节点结束
	//当cur为根节点的parent时,退出循环
	while (cur != -1)
	{
    
    
		//左右孩子均未被遍历,先向左遍历
		if (HT[cur].weight == 0)
		{
    
    
			HT[cur].weight = 1;    //表明其左孩子已经被遍历过了
			if (HT[cur].lchild != -1)
			{
    
       //如果当前节点不是叶子节点,则记下编码,并继续向左遍历
				code[code_len++] = '0';
				cur = HT[cur].lchild;
			}
			else
			{
    
       //如果当前节点是叶子节点,则终止编码,并将其保存起来
				code[code_len] = '\0';
				HC[cur] = (char *)malloc((code_len + 1) * sizeof(char));
				if (!HC[cur])
				{
    
    
					printf("HC[cur] malloc faild!");
					exit(-1);
				}
				strcpy(HC[cur], code);  //复制编码串
			}
		}

		//左孩子已被遍历,开始向右遍历右孩子
		else if (HT[cur].weight == 1)
		{
    
    
			HT[cur].weight = 2;   //表明其左右孩子均被遍历过了
			if (HT[cur].rchild != -1)
			{
    
       //如果当前节点不是叶子节点,则记下编码,并继续向右遍历
				code[code_len++] = '1';
				cur = HT[cur].rchild;
			}
		}

		//左右孩子均已被遍历,退回到父节点,同时编码长度减1
		else
		{
    
    
			HT[cur].weight = 0;
			cur = HT[cur].parent;
			--code_len;
		}

	}
	for (int i = 0; i < n; ++i) {
    
    
		printf("%s\n", HC[i]);
	}
	free(code);
}

/*
以下是从根结点开始遍历二叉树,求最小带权路径长度。关键步骤是求出各个叶子
结点的路径长度,用此路径长度*此结点的权值就是此结点带权路径长度,最后将
各个叶子结点的带权路径长度加起来即可。
*/
int countWPL2(HuffmanTree HT, int n)
{
    
    
	int cur = 2 * n - 2;    //当前遍历到的节点的序号,初始时为根节点序号
	int countRoads=0, WPL=0;//countRoads保存叶子结点的路径长度

//构建好赫夫曼树后,把visit[]用来当做遍历树时每个节点的状态标志
//visit[cur]=0表明当前节点的左右孩子都还没有被遍历
//visit[cur]=1表示当前节点的左孩子已经被遍历过,右孩子尚未被遍历
//visit[cur]=2表示当前节点的左右孩子均被遍历过
	int visit[maxSize] = {
    
     0 };//visit[]是标注数组,初始化为0

	//从根节点开始遍历,最后回到根节点结束
	//当cur为根节点的parent时,退出循环
	while (cur != -1)
	{
    
    
		//左右孩子均未被遍历,先向左遍历
		if (visit[cur]==0)
		{
    
    
			visit[cur] = 1;    //表明其左孩子已经被遍历过了
			if (HT[cur].lchild != -1)
			{
    
       //如果当前节点不是叶子节点,则路径长度+1,并继续向左遍历
				countRoads++;
				cur = HT[cur].lchild;
			}
			else
			{
    
       //如果当前节点是叶子节点,则计算此结点的带权路径长度,并将其保存起来
				WPL += countRoads * HT[cur].weight;
			}
		}

		//左孩子已被遍历,开始向右遍历右孩子
		else if (visit[cur]==1)
		{
    
    
			visit[cur] = 2;
			if (HT[cur].rchild != -1)
			{
    
       //如果当前节点不是叶子节点,则记下编码,并继续向右遍历
				countRoads++;
				cur = HT[cur].rchild;
			}
		}

		//左右孩子均已被遍历,退回到父节点,同时路径长度-1
		else
		{
    
    
			visit[cur] = 0;
			cur = HT[cur].parent;
			--countRoads;
		}

	}
	return WPL;
}

Guess you like

Origin blog.csdn.net/qq_45125250/article/details/109731584