Huffman tree and Huffman coding (based on priority queue implementation)

Huffman tree and Huffman coding (based on priority queue implementation)

concept

Base

  • Path: A branch from one node to another in a tree
  • Path Length: The number of branches on a path is called the path length
  • The path length of the tree: the sum of the path lengths from the root of the tree to each node
  • The weighted path length of a node: the product of the path length from the node to the root of the tree and the weight on the node
  • The weighted path length WPL of the tree: the sum of the weighted path lengths of all leaf nodes in the tree

Huffman tree concept

Defined by the weighted path length (WPL) of the tree, the formula can be obtained as follows, where wi w_iwiis the leaf node weight, li l_iliis the leaf node path length
WPL ( n ) = ∑ i = 1 nwili WPL(n) = \sum_{i=1}^{n} w_il_iWPL(n)=i=1nwili

Set the following n weights
{ w 1 , w 2 , . . . , wn } \{w_1,w_2,...,w_n\}{ w1,w2,...,wn}
Construct a binary tree with n leaf nodes, the weight of each leaf iswi w_iwi

Among them, the binary tree with the smallest WPL is called the Huffman tree , also called the optimal binary tree.

That is to say, the WPL of the Huffman tree satisfies the following conditions
WPLH affman = WPL min ( n ) WPL_{Haffman} = WPL_{min}(n)WPLHaffman=WPLmin( n )
It is not difficult to understand from this concept thatthe shape of the Huffman tree is not unique, but the WPL of each Huffman tree with a set of weights is unique

Starting from the root node, the edge connected to the left son is recorded as 0, and the edge connected to the right son is recorded as 1, until the leaf node, the binary sequence formed by the number of the passed edge is the Huffman code of the leaf node

The advantage of Huffman encoding is that the encoding for each leaf node (corresponding to a unique letter or other data) is unique, and any encoding will not be the prefix of another encoding, which ensures the uniqueness of decoding .

There are also some properties:

For n_0 with n 0n0A Huffman tree of leaf nodes, the number of nodes is as follows
node = 2 n 0 − 1 node = 2n_0-1node=2 n01

Building Huffman Tree and Obtaining Huffman Code

Assume that there is such a set of four letters and their weights as follows
{ ( A , 2 ) , ( B , 5 ) , ( C , 4 ) , ( D , 9 ) } \{(A,2) ,(B,5),(C,4),(D,9)\}{(A,2),(B,5),(C,4),(D,9 )}
Use this data to build a Huffman tree

First, the above data is regarded as a binary tree with only one node

Take out two binary trees with the smallest root node weight
{ 2 , 4 } \{2, 4\}{ 2 , 4 }
connect the two (the weight of the root node of the left subtree is less than or equal to the root node of the right subtree) to the same parent node, then the weight of the parent node is 6

insert image description here

In addition to the new tree just born, the weight corresponding to the root node of the tree that has not been used is
{ 5 , 6 , 9 } \{5,6,9\}{ 5,6,9 )
Also select the two smallest ones (if at this time, the weight of the newly constructed root node is equal to the weight of the original root node, the tree corresponding to the original root node is used as the left subtree, and the newly constructed root node corresponds to tree as the right subtree), repeat the above steps to get the following figure

insert image description here

At this time, only two trees are left unused, and the weights corresponding to the root nodes are as follows
{ 9 , 11 } \{9,11\}{ 9,11 }
Perform the above steps again to use all the trees, that is, get a complete Huffman tree

insert image description here

Record the edge connecting the left son as 0, and the edge connecting the right son as 1

insert image description here

You can get the Huffman code of each letter

(I usually write the coding sequence from right to left. For example, A is 011. If it is written from left to right, it should be written as 110. This article and the code demonstration below all use the right-to-left method) A :
011 , B : 01 , C : 111 , D : 0 A:011,B:01,C:111,D:0A:011,B:01,C:111,D:0
If a string of strings BDA, the corresponding Huffman code is as follows (the string sequence is read from left to right, and the code is read from right to left to decode)
0 10 011 0\space10\space0110 10 011

code demo

class definition

First define a Huffman tree class

typedef vector<pair<int, char>> huff;

class Huffman {
    
    
private:
	//结点定义
	class node {
    
    
	public:
		char word;
		int weight;				//权值

		node *lc, *rc;	//左儿子结点,右儿子结点
		string code;
		
		node(int weight, char word) {
    
    
			lc = nullptr;
			rc = nullptr;
			code = "";

			this->word = word;
			this->weight = weight;
		}
	};

private:
	priority_queue<pair<int, node*>, vector<pair<int, node*>>, greater<pair<int, node*>>> que;	//存放结点指针的优先队列(小根堆)
	node* root;	//根结点
	unordered_map<char, string> un_map;

public:
	Huffman(vector<pair<int, char>> vec);
	string Find(char index);

private:
	void _CreateCode(node *n, stack<string> st, int flag);

};

method

Constructor

Huffman::Huffman(vector<pair<int,char>> vec)
{
    
    
	//将外部数据转成结点,存放到优先队列
	for (auto it = vec.begin(); it != vec.end(); it++)
	{
    
    
		node* p = new node(it->first, it->second);
		this->que.push(make_pair(it->first,p));
	}

	//构造哈夫曼树
	int cnt = 0;	//计数,每弹出两个结点连接到一个父节点,cnt置为0
	pair<int,node*> temp[2];
	while (!que.empty())
	{
    
    
		if (cnt >= 2)
		{
    
    
			node* p = new node(temp[0].first + temp[1].first, ' ');
			p->lc = temp[0].second;
			p->rc = temp[1].second;
			que.push(make_pair(temp[0].first + temp[1].first, p));
			cnt = 0;

			this->root = p;	//不断设置根节点
			continue;
		}

		temp[cnt] = que.top();
		que.pop();
		cnt++;
	}
	//最后会剩余两个结点于temp中,把它们连接起来
	node* p = new node(temp[0].first + temp[1].first, ' ');
	p->lc = temp[0].second;
	p->rc = temp[1].second;
	this->root = p;


	//建立叶子节点编码,左分支为0,右分支为1
	stack<string> st;
	_CreateCode(this->root, st, 0);

}

code mapping

Save the Huffman code corresponding to the letter to the leaf node, and create a mapping relationship to the map

void Huffman::_CreateCode(node *n,stack<string> st, int flag)
{
    
    
	if (!n) return;

	flag == -1 ? st.push("0") : (flag == 0 ? st.push("") : st.push("1"));
	if (n->lc == nullptr && n->rc == nullptr)
	{
    
    
		while (!st.empty())
		{
    
    
			n->code += st.top();
			st.pop();
		}
		un_map[n->word] = n->code;
		return;
	}

	this->_CreateCode(n->lc, st, -1);
	this->_CreateCode(n->rc, st, 1);
}

Inquire

If you query, you only need to return unordered_map<char, string> un_map;the value corresponding to the relevant key.

string Huffman::Find(char index)
{
    
    
	return this->un_map[index];
}

Finish

The shortcoming of the code here is that there is still decoding not implemented. But I think it’s enough to use the Huffman code of each character as a substring to match the main string to achieve decoding. This should be a problem related to string algorithms, so I didn’t write it here.

Guess you like

Origin blog.csdn.net/qq_42759112/article/details/127505908