Use C language to realize the coding, compression and decoding process of Huffman tree.

The concept and algorithm of Huffman tree

1. The concept of related rankings:

Path and path length: The branch from one node in the tree to another node is the path, and its length is the path length. The path length of the tree is defined as the sum of the path lengths from the root node to each node.
Weight: The weight of a node is the proportion of the node in all nodes. Take a text file as an example, the node weight of a character is the ratio of the number of times the character appears to the size of the text file.
Weighted path length: The weighted path length of a node in the tree is the ratio of the node's path length to the node's weight. The weighted path length (WPL) of a tree is the sum of the weighted path lengths of all leaf nodes in the tree.
The binary tree with the smallest weighted path length is the Huffman tree (optimal binary tree)!
Algorithm description:
1. According to the given n weights, a set of n binary trees is formed : F = {T1, T2, T3, T4,...}. There is only one root node with weight W _i in each binary tree, and its left and right children are empty.
2. Select two trees with the smallest weight in F as the left and right subtrees of a new node to construct a new binary tree, and set the weight of the new tree to the sum of the weights of the two left and right subtrees .
3. Delete the two left and right subtrees in F, and add the new binary tree to F.
4. Repeat the process of 2 and 3 until F contains only one binary tree. (Here is the root node in the tree)

The definition of the abstract data type of the Huffman tree:

/* 哈夫曼树节点 */
typedef struct _haffman {
	char data;							//用来存放节点字符的数据域
	int weight;							//权重
	struct _haffman *leftChild;			//左孩子节点
	struct _haffman *rightChild;		//右孩子节点

}HaffNode;

Macro definition

#define MAX_SIZE 256					//编码数量
#define HALF_MAX 128					//一半的数量
#define ASCII_SIZE 128					//ASCII码的数量

Define some global variables that need to be used

/* 以顺序结构存储的树节点--编码解码的字符映射表 --即存储原数据*/
HaffNode node[MAX_SIZE];

/* 用来保存所有左孩子节点--为总节点数的一半 */
HaffNode left[HALF_MAX];

/* 用来保存所有右孩子节点 --为总节点数的一半*/
HaffNode right[HALF_MAX];

/** 创建一个二维数组，用来保存每一个叶节点的编码 */
char code[MAX_SIZE][HALF_MAX];

Code implementation of constructing Huffman tree:

/* 构造哈夫曼树
	@param node 哈夫曼树的根节点
	@param length 节点数组的长度

*/
void CreatHaffmanTree(HaffNode * node, int length)
{
	if (length <= 0)
	{
		printf("长度为0，无法创建哈夫曼树！\n");
		return;
	}
	SortHaffmanNode(node, length);		//先进行节点权值的排序
	HaffNode parent;									//构建一个以node数组最后两个节点组成的父节点
	left[length - 1] = node[length - 1];				//权重最小的节点
	right[length - 1] = node[length - 2];				//权重第二小的节点
	parent.weight = left[length - 1].weight + right[length - 1].weight;	//累加权重
	parent.leftChild = &left[length - 1];				//左孩子指向相对小的值	
	parent.rightChild = &right[length - 1];				//右孩子指向相对更大的值
	node[length - 2] = parent;					//将倒数第二个替代为parent节点，数组长度 - 1，递归创建哈夫曼树
	CreatHaffmanTree(node, length - 1);			//递归，并且长度自动减一，每一次都会进行一次重新排序。

}

Before constructing the Huffman tree, it is necessary to sort the weights of the node array, which is the above-mentioned SorHaffmanNode(); function. Here I use the bubble sort method for sorting.
The following is its code implementation :


/* 哈夫曼树的排序 --使用冒泡排序
 *@param node 节点数组
 * @param length 节点的数量
*/
void SortHaffmanNode(HaffNode * node, int length)
{
	HaffNode tempNode;
	for (int i = 0; i < length - 1; i++)
	{
		for (int j = 0; j < length - i - 1; j++)
		{
			if (node[j].weight < node[j + 1].weight)		//根据权重比较来排序--从大到小
			{
				tempNode = node[j];
				node[j] = node[j + 1];
				node[j + 1] = tempNode;
			}
		}
	}
}

Huffman tree coding:

Process: Starting from the root node, the code of the left branch is 0, and the code of the right branch is 1. Until the end of the leaf node, the code is the code of the node data. Each traversal starts from the root node and traverses again.
Code implementation :

/**
 * 创建一个编码函数
 * @param node 哈夫曼树节点数组
 * @param tempnode 编码后的字符数组
 * @param index 当前操作节点数组的下标
 */
void Coding(HaffNode * node, char * tempnode, int index)
{
	if (!node) return;
	if (node->leftChild == NULL || node->rightChild == NULL)
	{
		//使用字符数组，将编码形式保存
		tempnode[index] = '\0';			//字符串结束的标志
		strcpy(code[node->data - 0], tempnode);			//这里叶节点的值使用的是字母，可以使用ASCII码的形式确认存储的位置，也可以用强制类型转换
	}
	tempnode[index] = '0';			//左支路的编码为0
	Coding(node->leftChild, tempnode, index + 1);			//先递归调用左支路，
	tempnode[index] = '1';			//右支路的编码为1
	Coding(node->rightChild, tempnode, index + 1);			//再递归调用右支路
}

The decoding process of the Huffman tree:

Process: For a binary file composed of code 0/1, in the same tree, start from the first one. When encountering 0, visit the left subtree, and when encountering 1, visit the right subtree. Until the node has no left and right nodes.
Code

/** 解码过程 -- flag 0/1 来表示往哪边走，左走0，右走1 */
HaffNode *Unziped(HaffNode *node, int flag)
{
	if (flag == 1)
		return node->rightChild;		//右子树
	else if(flag == 0)
		return node->leftChild;			//左子树
	return NULL;
}

The above is the code implemented by the Huffman tree through the C language, from construction to encoding to decoding these three processes. This is what I learned from online tutorials, but in the process of real encoding and decoding, the decompressed file and the original file have some unnecessary characters. I still don't know where the problem is, maybe something went wrong during the construction, I hope the big guy will give you some pointers!
If someone needs all the code implementation, they can also send me a private message, Xiaobai, I don’t know how to put the folder up, ask for information!