Comic: What is "Huffman Tree"?

Author | 小 灰

Source | Programmer Xiaohui (ID: chengxuyuanxiaohui)

----- the next day -----

————————————

Concept 1: What is a path?

In a tree, all the nodes that pass from one node to another are called the path between the two nodes.

In the binary tree above, the path from the root node A to the leaf node H is A, B, D, H.

Concept 2: What is the path length?

In a tree, the number of "edges" traversed from one node to another is called the length of the path between the two nodes.

Still using the binary tree example just now, from the root node A to the leaf node H, a total of 3 edges are passed, so the path length is 3.

Concept 3: What is the weighted path length of a node?

Each node of the tree can have its own "weight" (Weight), the weight can play different roles in different algorithms.

The weighted path length of a node refers to the product of the path length from the root node of the tree to the node and the weight of the node.

Suppose the weight of node H is 3, and the path length from the root node to node H is also 3. Therefore, the weighted path length of node H is 3 X 3 = 9.

Concept 4: What is the weighted path length of the tree?

In a tree, the sum of the weighted path lengths of all leaf nodes is called the weighted path length of the tree, and is also referred to as WPL for short.

Still taking this binary tree as an example, the path length of the tree is 3X3 + 6X3 + 1X2 + 4X2 + 8X2 = 53.

The Huffman tree was invented by Dr. Huffman of the Massachusetts Institute of Technology in 1952. What kind of tree is this?

We just learned the weighted path length (WPL) of the tree, and the Huffman tree is a binary tree with the smallest weighted path length under the condition of leaf nodes and weights, also known as the optimal binary tree .

For example, given the leaf nodes with weights of 1, 3, 4, 6, 8 respectively, what kind of binary tree should we construct to ensure that the weighted path length is minimum?

In principle, we should keep leaf nodes with small weights away from the roots, and leaf nodes with large weights close to the roots.

The tree on the left side of the figure below is a Huffman tree with a WPL of 46, which is smaller than 53 in the previous example:

It should be noted that there may be more than one Huffman tree formed by the same leaf nodes. The following trees are all Huffman trees:

Suppose there are 6 leaf nodes, and the weights are 2, 3, 7, 9, 18, 25 in order. How to construct a Huffman tree, that is, the tree with the smallest weighted path length?

Step 1: Build the forest

We treat each leaf node as an independent tree (a tree with only root nodes), thus forming a forest:

In the picture above, the forest of leaf nodes is on the right, and an auxiliary queue is on the left. All leaf nodes are stored in descending order of weight. As for the role of auxiliary queues, we will see later.

Step 2: Select the two nodes with the smallest current weight to generate a new parent node

With the help of the auxiliary queue, we can find the nodes 2 and 3 with the smallest weight, and generate a new parent node based on these two nodes. The weight of the parent node is the sum of the weights of these two nodes:

Step 3: Remove the two smallest nodes selected in the previous step from the queue and add the new parent node to the queue

That is, delete 2 and 3 from the queue, insert 5, and still maintain the ascending order of the queue:

Step 4: Select the two nodes with the smallest current weight to generate a new parent node.

This is a repeated operation for the second step. The nodes with the smallest weights in the current queue are 5 and 7, and the weights for generating new parent nodes are 5 + 7 = 12:

Step 5: Remove the two smallest nodes selected in the previous step from the queue and add the new parent node to the queue.

This is the repeated operation of the third step, which is to delete 5 and 7 from the queue, insert 12, and still maintain the ascending order of the queue:

Step 6: Select the two nodes with the smallest current weight to generate a new parent node.

This is a repeated operation for the second step. The nodes with the smallest weight in the current queue are 9 and 12, and the weight of the new parent node is 9 + 12 = 21:

Step 7: Remove the two smallest nodes selected in the previous step from the queue and add the new parent node to the queue.

This is a repeated operation for the third step, which is to delete 9 and 12 from the queue, insert 21, and still maintain the ascending order of the queue:

Step 8: Select the two nodes with the smallest current weight to generate a new parent node.

This is a repeated operation for the second step. The nodes with the smallest weight in the current queue are 18 and 21, and the weight of the new parent node is 18 + 21 = 39:

Step 9: Remove the two smallest nodes selected in the previous step from the queue and add the new parent node to the queue.

This is a repeated operation for the third step, which is to delete 18 and 21 from the queue, insert 39, and still maintain the ascending order of the queue:

Step 10: Select the two nodes with the smallest current weight to generate a new parent node.

This is a repeated operation for the second step. The nodes with the smallest weights in the current queue are 25 and 39, and the weights for generating new parent nodes are 25 + 39 = 64:

Step 11: Remove the two smallest nodes selected in the previous step from the queue and add the new parent node to the queue

This is the repeated operation for the third step, which is to delete 25 and 39 from the queue and insert 64:

At this time, there is only one node in the queue, indicating that the entire forest has been merged into a tree, and this tree is the Huffman tree we want:

private Node root;
private Node[] nodes;

//构建哈夫曼树
public void createHuffman(int[] weights) {

//优先队列,用于辅助构建哈夫曼树

Queue<Node> nodeQueue = new PriorityQueue<>();
    nodes = new Node[weights.length];


//构建森林,初始化nodes数组

for(int i=0; i<weights.length; i++){
        nodes[i] = new Node(weights[i]);
        nodeQueue.add(nodes[i]);

}


//主循环,当结点队列只剩一个结点时结束

while (nodeQueue.size() > 1) {

//从结点队列选择权值最小的两个结点

Node left = nodeQueue.poll();

Node right = nodeQueue.poll();

//创建新结点作为两结点的父节点

Node parent = new Node(left.weight + right.weight, left, right);
        nodeQueue.add(parent);

}
    root = nodeQueue.poll();
}

//按照前序遍历输出
public void output(Node head) {

if(head == null){

return;

}

System.out.println(head.weight);
    output(head.lChild);
    output(head.rChild);
}

public static class Node implements Comparable<Node>{

int weight;

Node lChild;

Node rChild;


public Node(int weight) {

this.weight = weight;

}


public Node(int weight, Node lChild, Node rChild) {

this.weight = weight;

this.lChild = lChild;

this.rChild = rChild;

}


@Override

public int compareTo(Node o) {

return new Integer(this.weight).compareTo(new Integer(o.weight));

}
}

public static void main(String[] args) {

int[] weights = {2,3,7,9,18,25};

HuffmanTree huffmanTree = new HuffmanTree();
    huffmanTree.createHuffman(weights);
    huffmanTree.output(huffmanTree.root);
}

In this code, in order to ensure that the nodes in the node queue are always arranged in ascending order of weight, we use the priority queue PriorityQueue.

At the same time, the static inner class Node needs to implement the comparison interface and rewrite the compareTo method to ensure that the Node object is compared according to the weight when entering the queue.


【END】

More exciting recommendations

☞The 360 tycoon who was blocked by Zhou Hongyi across the net: grassroots workers to 36-year-old's hundreds of millions of counterattacks!

☞ In 2020, 5 programming languages ​​worth learning in cybersecurity

10x HD does not cost! Barley end selection SVG rendering

☞Microsoft acquired a company for one person? Crack Sony programs, write hacker novels, and watch his tough program life!

Machine learning project template: 6 basic steps of ML project

☞IBM, Microsoft, Apple, Google, Samsung ... These technology giants in the blockchain have already done so many things!

Summary by senior programmers: I will tell you all 6 ways to analyze the Linux process

Today's Welfare: If you leave a comment in the comment area, you can get a ticket for the live broadcast of the "2020 AI Developer Ten Thousand Conference" worth 299 yuan . Come and move your finger and write what you want to say.

Click to read the original text, wonderful to continue!

Every "watching" you order, I take it seriously

Published 1940 original articles · 40 thousand likes + · 18.13 million views

Guess you like

Origin blog.csdn.net/csdnnews/article/details/105424420