C language detailed explanation of Huffman tree

C language detailed explanation of Huffman tree

1) Explanation of some terms:

  1. Path: In a tree, the path from one node to another is called a path. In Figure 1, the path from the root node to node a is a path.

  2. Path length: In a path, each time a node passes through, the path length must be increased by 1. For example, in a tree, the root node is specified as one layer, then the path length from the root node to the node of the i-th layer is i-1. The path length from the root node to node c in Figure 1 is 3.

  3. Node's weight: Assign a new value to each node, which is called the node's weight.

  4. The weighted path length of a node: refers to the product of the length of the path from the root node to the node and the weight of the node

2) Definition of Huffman tree:

When trying to build a tree with n nodes (all as leaf nodes and each with their own weights), if the length of the weighted path of the tree to be constructed is the smallest, the tree is called the "optimal binary tree" , Sometimes called "Huffman tree" or "Huffman tree".

When constructing the Huffman tree, to minimize the weighted path length of the tree, only one principle needs to be followed, that is: the node with the larger weight is closer to the root of the tree. In Figure 1, because node a has the largest weight, it should be directly used as the child node of the root node.

3) Construction process:

For a given n nodes with respective weights, there is an effective way to construct a Huffman tree:

  1. Two smallest weights are selected from n weights, and the corresponding two nodes form a new binary tree, and the weight of the root node of the new binary tree is the sum of the weights of the left and right children;

  2. Delete the two smallest weights from the original n weights, and add the new weights to the ranks of n–2 weights, and so on;

  3. Repeat 1 and 2 until all the nodes are constructed into a binary tree, which is the Huffman tree.

4) Algorithm implementation:

When constructing a Huffman tree, it is necessary to filter out the two nodes with the smallest value according to the weight value of each node each time, and then construct a binary tree.

The idea of ​​finding the two nodes with the smallest weight value is: starting from the starting position of the tree group, first find two nodes without a parent node (indicating that they have not been used to construct a tree), and then follow the node without a parent node To compare the nodes in turn, there are two situations to consider:

  • If it is smaller than the smaller of the two nodes, keep this node and delete the original larger node;

  • If it is between two node weight values, replace the original larger node;

#include<iostream>
#include<string.h>
#define  MAX 10000 
/*
请输入一段文字:this*program*is*my*favourite
字符和权值信息如下
字符:t  权值:2
字符:h  权值:1
字符:i  权值:3
字符:s  权值:2
字符:*  权值:4
字符:p  权值:1
字符:r  权值:3
字符:o  权值:2
字符:g  权值:1
字符:a  权值:2
字符:m  权值:2
字符:y  权值:1
字符:f  权值:1
字符:v  权值:1
字符:u  权值:1
字符:e  权值:1
********************************
字符编码为:
t:1000
h:11010
i:001
s:1001
*:011
p:11011
r:010
o:1010
g:11100
a:1011
m:1100
y:11101
f:11110
v:11111
u:0000
e:0001
文字编码为:
100011010001100101111011010101011100010101111000110011001011110011101011111101011111111010000001000110000001
********************************
译码:
请输入要译码的二进制字符串,输入'#'结束:100011010001100101111011010101011100010101111000110011001011110011101011111101011111111010000001000110000001#
译码为:
this*program*is*my*favourite
是否继续?(Y/N):
*/
using namespace std;
typedef struct {
	char letter, *code;
	int weight;
	int parent, lchild, rchild;
}HTNode, *HuffmanTree;
int n;
char coding[100];
int Min(HuffmanTree &HT, int i)
{
	int j;
	unsigned int k = MAX;
	int flag;
	for (j = 0; j <= i; ++j)
	{
		if (HT[j].weight < k && HT[j].parent == 0)//用父结点是否为0来判断此结点是否已经被选过  
		{
			k = HT[j].weight;
			flag = j;
		}
	}
	HT[flag].parent = 1;//作个标记,说明已经被选择了,因为在Select函数中要选择权值小的两个结点  
	return flag;
}
void Select(HuffmanTree &HT, int i, int &s1, int &s2)
{
	//在HT[1...i]中选择parent为0且权值最小的两个结点,其序号分别为s1,s2  
	//s1 <= s2  
	s1 = Min(HT, i);
	s2 = Min(HT, i);
}
void CreateHuffmanTree(HuffmanTree &HT, char t[], int w[])
{
	int m;
	int i, s1, s2;
	if (n <= 1)
		return;
	m = 2 * n - 1; //总共需要2n-1个节点
	HT = new HTNode[m + 1];//开辟空间
	for (i = 0; i<n; i++)
	{
		HT[i].code = '\0';
		HT[i].parent = 0;
		HT[i].lchild = 0;
		HT[i].rchild = 0;
		HT[i].letter = t[i];
		HT[i].weight = w[i];
	}
	for (i = n; i <= m; i++)
	{
		HT[i].code = '\0';
		HT[i].parent = 0;
		HT[i].lchild = 0;
		HT[i].rchild = 0;
		HT[i].letter = ' ';
		HT[i].weight = 0;
	}
	cout << "********************************" << endl;
	for (i = n; i<m; i++)
	{
		Select(HT, i - 1, s1, s2);//在n个数中找出权值最小的两个

		HT[s1].parent = i;
		HT[s2].parent = i;//将他们两个的parent节点设置为i;

		HT[i].lchild = s1;
		HT[i].rchild = s2;//把这两个分别当作左右节点
		HT[i].weight = HT[s1].weight + HT[s2].weight;//他们两个的双亲为他们两个的和;

	}
}
void CreatHuffmanCode(HuffmanTree HT)
{
	int start, c, f;
	int i;
	char *cd = new char[n];
	cd[n - 1] = '\0';
	cout << "字符编码为:" << endl;
	for (i = 0; i<n; i++)
	{
		start = n - 1;
		c = i;
		f = HT[i].parent;
		while (f != 0){
			--start;
			if (HT[f].lchild == c){
				cd[start] = '0';
			}
			else{
				cd[start] = '1';
			}
			c = f;
			f = HT[f].parent;
		}
		HT[i].code = new char[n - start];
		strcpy(HT[i].code, &cd[start]);
		cout << HT[i].letter << ":" << HT[i].code << endl;
	}
	delete cd;
}
void HuffmanTreeYima(HuffmanTree HT, char cod[], int b)           //译码
{
	char sen[100];
	char temp[50];
	char voidstr[] = " ";       //空白字符串
	int t = 0;
	int s = 0;
	int count = 0;
	for (int i = 0; i<b; i++)
	{
		temp[t++] = cod[i];     //读取字符
		temp[t] = '\0';        //有效字符串
		for (int j = 0; j<n; j++){        //依次与所有字符编码开始匹配
			if (!strcmp(HT[j].code, temp)){                //匹配成功
				sen[s] = HT[j].letter;    //将字符保存到sen中
				s++;
				count += t;
				strcpy(temp, voidstr);                //将TEMP置空 
				t = 0;          //t置空
				break;
			}
		}
	}
	if (t == 0){     //t如果被置空了,表示都匹配出来了,打印译码
		sen[s] = '\0';
		cout << "译码为:" << endl;
		cout << sen << endl;
	}
	else{                             //t如果没有被置空 , 源码无法被完全匹配
		cout << "二进制源码有错!从第" << count + 1 << "位开始" << endl;
	}
}
int main()
{
	HuffmanTree HT;
	char a[100], buff[1024], p;//a为存放字符 buff为输入的字符串 p为输入译码时的字符 
	int b[100];//存放权值信息 
	int i, j;
	int symbol = 1, x, k; //译码时做判断用的变量  
	cout << "请输入一段文字:";
	cin >> buff;
	int len = strlen(buff);
	for (i = 0; i<len; i++)
	{
		for (j = 0; j<n; j++)
		{
			if (a[j] == buff[i])
			{
				b[j] = b[j] + 1;
				break;
			}
		}
		if (j >= n)
		{
			a[n] = buff[i];
			b[n] = 1;
			n++;
		}
	}
	cout << "字符和权值信息如下" << endl;
	for (i = 0; i<n; i++)
	{
		cout << "字符:" << a[i] << "  权值:" << b[i] << endl;
	}
	CreateHuffmanTree(HT, a, b);
	CreatHuffmanCode(HT);
	cout << "文字编码为:\n";
	for (int i = 0; i < len; i++)
	{
		for (int j = 0; j < n; j++)
		{
			if (buff[i] == HT[j].letter)
			{
				cout << HT[j].code;
				break;
			}
		}
	}
	cout <<endl<< "********************************" << endl;
	cout << "译码:" << endl;
	while (1)
	{
		cout << "请输入要译码的二进制字符串,输入'#'结束:";
		x = 1;//判断是否有非法字符只能是0 1 
		k = 0;//作为循环变量来使coding【k】=输入的字符 
		symbol = 1;//判断是否输入结束 
		while (symbol){
			cin >> p;
			if (p != '1'&&p != '0'&&p != '#'){ //若存在其它字符,x设为0,表示输入的不是二进制
				x = 0;
			}
			coding[k] = p;
			if (p == '#')
				symbol = 0;  //#号结束标志
			k++;
		}
		if (x == 1){
			HuffmanTreeYima(HT, coding, k - 1);        //进行译码
		}
		else{
			cout << "有非法字符!" << endl;
		}
		cout << "是否继续?(Y/N):";
		cin >> p;
		if (p == 'y' || p == 'Y')
			continue;
		else
			break;
	}
	return 0;
}

 

Guess you like

Origin blog.csdn.net/m0_49019274/article/details/115290550