Application of heap -- Top-K problem (with details)


To solve the top-k problem, we should be familiar with a data structure - heap (priority queue) , and friends who already know can skip it.

1. What is a heap?

heap structure

A heap is actually a binary tree, but an ordinary binary tree stores data in a chained structure, while a heap stores data sequentially in an array. So what kind of binary tree is suitable for sequential storage?
We assume that an ordinary binary tree can be stored in an array, then we can get the following structure:
insert image description here
We can see that when there are empty values ​​in the middle of the binary tree, the storage space of the array will be wasted, then under what circumstances will the space not be wasted Woolen cloth? That is a complete binary tree.
insert image description here
From the above structure, we cannot use the pointer of the chain structure to access the child node or the parent node, but only through the corresponding subscript, which is actually relatively simple.
For example, the following figure:
the subscript of the known 2 node is 1, then the subscript of
his left child is: 2 * 2 + 1 = 3
The subscript of his right child is: 2 * 2 + 2 = 4
On the contrary, it is known that 1 The subscript of the node is 3, the subscript of the 3 node is 4, then the subscript of the parent node of the
1 node is: (3 - 1) / 2 = 1
The subscript of the parent node of the 3 node is: (4 - 1) / 2 = 1
insert image description here

Big Root Heap vs Small Root Heap

Big root heap (maximum heap)
Big root heap guarantees that the root node of each binary tree is larger than the left and right child nodes
. Adjust from the root node of the last subtree to the root node of each subtree, so that each subtree is adjusted downward. For the big root heap, and finally make the final adjustment downward to ensure that the binary tree as a whole is a large root heap (this adjustment is mainly for the subsequent heap sorting).
insert image description here
The specific adjustment process is as follows:
insert image description hereinsert image description here
How to implement it with code?
We first adjust from the last subtree, then we need to get the root node parent of the last subtree. We know that the subscript of the last node of the array is len - 1, and this node is the left child or right child of the last subtree. Children, you can get the root node subscript (parent) according to the subscript of the child, and parent-- can make each subtree adjust until you reach the root node, and then adjust down for the last time, you can get a big root heap .
insert image description here

// 将数组变成大根堆结构
public void createHeap(int[] arr){
    
    
    for (int i = 0; i < arr.length; i++) {
    
    
        elem[i] = arr[i];// 放入elem[],假设不需要扩容
        usedSize++;
    }
    // 得到根节点parent, parent--依次来到每颗子树的根节点,
    for (int parent = (usedSize-1-1)/2; parent >= 0; parent--) {
    
    
        // 依次向下搜索,使得每颗子树都变成大根堆
        shiftDown(parent,usedSize);
    }
}
// 向下搜索变成大根堆
public void shiftDown(int parent,int len){
    
    
    int child = parent*2+1;// 拿到左孩子
    while (child < len){
    
    
        // 如果有右孩子,比较左右孩子大小,得到较大的值和父节点比较 
        if (child+1 < len && (elem[child] < elem[child+1])){
    
    
            child++;
        }
        // 比较较大的孩子和父节点,看是否要交换
        int max = elem[parent] >= elem[child] ? parent : child;
        if (max == parent) break;// 如果不需要调整了,说明当前子树已经是大根堆了,直接 break
        swap(elem,parent,child);
        parent = child;// 继续向下检测,看是否要调整
        child = parent*2+1;
    }
}
public void swap(int[] arr,int i,int j){
    
    
  	int temp = arr[i];
    arr[i] = arr[j];
    arr[j] = temp;
}

Small root heap (minimum heap)
Small root heap guarantees that the root node of each binary tree is smaller than the left and right child nodes
. The adjustment process is the same as above.
insert image description here

Priority Queue (PriorityQueue)

In java, a heap data structure (PriorityQueue), also called a priority queue, is provided. When we create such an object, we get a small root heap with no added data, and we can add or delete elements to it. , every time an element is deleted or added to it, the system will adjust it as a whole and re-adjust it to a small root heap.

// 默认得到一个小根堆
PriorityQueue<Integer> smallHeap = new PriorityQueue<>();
smallHeap.offer(23);
smallHeap.offer(2);
smallHeap.offer(11);
System.out.println(smallHeap.poll());// 弹出2,剩余最小的元素就是11,会被调整到堆顶,下一次弹出
System.out.println(smallHeap.poll());// 弹出11

 // 如果需要得到大根堆,在里面传一个比较器
 PriorityQueue<Integer> BigHeap = new PriorityQueue<>(new Comparator<Integer>() {
    
    
     @Override
     public int compare(Integer o1, Integer o2) {
    
    
         return o2 - o1;
     }
 });

2. Top-k problem solving ideas

Example: There is a bunch of elements that let you find the first three smallest elements.

Idea 1: Sort the array from small to large and get the first 3 elements of the array. However, it can be found that the time complexity is too high to be desirable.

Idea 2: Put all the elements into a heap structure, and then pop up three elements, each popped element is the smallest of the current heap, then the three popped up elements are the first three smallest elements.
This idea can be done, but suppose I have 1,000,000 elements and only pop the first three smallest elements, then a heap of size 1,000,000 will be used. The space complexity of doing so is too high, and this method is not recommended.

Idea 3:
We need to get the three smallest elements, then build a heap with a size of 3. Assuming that the current heap structure is just full of three elements, then these three elements are the current three smallest elements. Assuming that the fourth element is one of the elements we want, then at least one of the first three elements is not what we want and needs to be popped, so who pops up?
What we want to get is the first three smallest elements, so the largest element in the current heap structure must not be what we want , so here we build a large root heap. Pop the element, then put the fourth element until the entire array is traversed.
insert image description here
insert image description here
insert image description here
insert image description here
In this way, we get a heap with only the first three smallest elements, and we can see that the size of the heap is always 3, instead of building as much data as there is, and then popping the elements one by one.

// 找前 k个最小的元素
public static int[] topK(int[] arr,int k){
    
    
     // 创建一个大小为 k的大根堆
     PriorityQueue<Integer> maxHeap = new PriorityQueue<>(k,new Comparator<Integer>() {
    
    
         @Override
         public int compare(Integer o1, Integer o2) {
    
    
             return o2 - o1;
         }
     });
     for (int i = 0; i < arr.length; i++) {
    
    
         if (i < k){
    
    
             // 放入前 k 个元素
             maxHeap.offer(arr[i]);
         }else{
    
    
             // 从第 k+1个元素开始进行判断是否要入堆
             if (maxHeap.peek() > arr[i]){
    
    
                 maxHeap.poll();
                 maxHeap.offer(arr[i]);
             }
         }
     }
     int[] ret = new int[k];
     for (int i = 0; i < k; i++) {
    
    
         ret[i] = maxHeap.poll();
     }
     return ret;
 }

The above is the basic idea of ​​​​the top-k problem, and other similar problems are also solved in the same way.
Summary:
1. If you want the top K largest elements, you need to build a small root heap.
2. If you want the first K smallest elements, you need to build a large root heap.
3. If you want the K-th largest element, you need to build a small root heap (the top element of the heap is just that).
4. If you want to find the K-th smallest element, you need to build a large root heap (the top element of the heap is).

Guess you like

Origin blog.csdn.net/qq_45792749/article/details/124143180