Interview must know will be | understanding heap and heap sort

This article first appeared: the interview must know will be | understanding heap and sort heap
micro-channel public number: back-end technology Compass
continuous output of dry cargo welcome direct concern surprise!

This article will explain the basic principles of the heap and heap sort, this article will learn the following:

  1. Heap data structure definitions
  2. Array representation of the heap
  3. Heap adjustment function
  4. Heapsort practice

1. Introduction heap

Heap is a special tree data structure in computer science.
If it satisfies the following characteristics, can be called a stack: the stack of any given node P and C, if P C is a parent node, then the value of P will be less than equal to the value of C. If the value of the constant is smaller than the parent node is equal to a child node, this is called the minimum heap stack; otherwise known as the maximum stack.
Heap Heap Sort JWJ Williams began publication in 1964, when he proposed a binary heap as a tree data structure of this algorithm, also piled on important key Dijkstra's algorithm, and prioritized queue.

Data structure different from the heap memory allocated heap, we are talking about a sort of heap is a collection of structural elements that heap is a binary tree.

Reactor with two decisive characteristics: shape and order of the elements of the tree

  • Sequence elements:
    the heap any node and its child nodes are to comply with the size of the value magnitude relation.
    A. If the node is greater than the sum of all of its child nodes, the root of the heap is the largest of all the elements, which is called the heap large root heap (heap big top, max heap);
    B. If the node which is less than or equal to all child nodes, i.e. heap root is the smallest of all the elements, this is called a small heap root heap (heap minor vertex, the minimum heap);
    C. large root heap / stack rootlets only agreed parent node and child node the magnitude relation, but it does not constrain the relative magnitude and sequence of child nodes;
    FIG small root stack structure:

  • Tree shapes:

This stack has a maximum of two binary tree leaf nodes, and the bottom of the distribution of the left leaf node, the tree does not exist an idle position, i.e. the stack is a complete binary tree.

The above two properties can quickly find the most value guarantee, and inserting or deleting new elements can be implemented to reorganize to meet the nature of the heap again.

2. heap array representation

No free heap location and the array is continuous, but the array index starts from 0, in order unity, we unified from the beginning, i.e. an array of root node index = 1,

It can be found around the child node via the parent node by index arrays can also be found through the parent node child nodes.

Element of the array is the result of traversing traverse the level of the stack, so the heap memory array has the following properties:

1  // array subscript range 
2 i <= n-i &&> = 1 
. 3  // root subscript 1 
. 4 root_index = 1 
. 5  // traverse the level value of the i-th node is equal to the i-th element of the array 
6 value (i) = Array [i]
 . 7  // left child index i i-th element of the stack 2 * 
. 8 left_child_index (i) = i * 2 
. 9  // stack i-th element of the right child index i . 1 + 2 * 
10 right_child_index (i) = i * 2 + . 1 
. 11  // parent node of the i-th element stack subscript i / 2 
12 is parent (i) = i / 2

Corresponding relationship between the stack and the array of FIG:

3. heap adjustment function

Heap adjustment process very much like recursive mathematical induction process, look, you know.

Knock blackboard! The following two functions are very important to grasp the heap.

  • Function principle siftup

Small root stack, for example, before a [1 ... n-1] satisfying the characteristics of the stack, after the array a [n] inserting a new element, two situations arises:

A. If a [n] then the parent node is greater than a [1 ... n] characteristic stack still meet, does not require adjustment;

B. If a [n] than its parent node should not guarantee that small heap, need to be adjusted;

Cycle : bottom-up adjustment process is the new element is added continuously during the upward displacement of the comparison, until the value of the new node is greater than a parent node, or until the new node becomes the root node.

Stop conditions : The key siftup is a continuous cycle process is displaced upward, understanding the cycle is the cycle stop condition.

It can be clearly seen from the pseudocode, siftup pseudo code:

1  // precondition siftup run 
2 heap ( 1 , N- 1 ) == True
 . 3  void siftup (n-)
 . 4       I = n-
 . 5       Loop:
 . 6           // circulation stop condition is a
 7           // already root 
8           IF I == . 1 :
 . 9               BREAK ;
 10           P = I / 2 
. 11           // cycle stop condition II
 12           @ adjusting than or equal to the parent node for this node position 
13 is           IF a [P] <= a [I]
 14                BREAK ;
 15          the swap (A [P], A [I])
 16           // cycle continues upwardly 
. 17           I = P

siftup adjustment process demonstration

Inserting adjustment element 16 in FIG tail:

  • Function principle siftdn

Small root stack, for example, before a [1 ... n] characteristic satisfying the stack, after the array a [1] to update the elements, gives rise to two situations:

A. If a [1] or less sub-node properties still meeting the stack, no adjustment;

B. If a [1] is greater than a child node not guarantee that the heap, need to be adjusted;

Cycle : a top-down process is newly added to the process of adjusting elements down through comparison replacement until the value of the new node is less of its child nodes, the new node or reaches a leaf node.

Stop conditions : The key siftdn is a continuous cycle process is displaced downward, understanding the cycle is the cycle stop condition.

We can clearly be seen from the pseudocode siftdn pseudocode:

 1 heap(2,n) == True
 2 void siftdn(n)
 3      i = 1
 4      loop:
 5          // 获取理论上的左孩子下标
 6          c = 2*i
 7          // 如果左孩子下标已经越界 
 8          // 说明当前已经是叶子结点
 9          if c > n:
10              break;
11          //如果存在右孩子 
12          // 则获取左右孩子中更小的一个
13          // 和父结点比较
14          if c+1 <= n:
15              if a[c] > a[c+1]
16                  c++
17          // 父结点小于等于左右孩子结点则停止
18          if a[i] <= a[c]
19              break;
20          // 父结点比左右孩子结点大 
21          // 则与其中较小的孩子结点交换
22          // 也就是让原来的孩子结点成为父结点
23          swap(a[i],a[c])
24          // 继续向下循环
25          i = c     

siftdn调整过程演示:

在头部元素更新为21的调整过程如图:

 

4.堆排序

堆排序的场景

假如有200w数据,要找最大的前10个数,那么就需要先建立大小为10个元素的小顶堆,然后再逐渐把其他所有元素依次渗透进来比较或入堆淘汰老数据或跳过,直至所有数据渗透完成,最后小根堆的10个元素就是最大的10个数了。

最大TopN使用小根堆的原因:

选择最大的TopN个数据使用小根堆,因为堆顶就是最小的数据,每次进来的新数据只需要和堆顶比较即可,如果小于堆顶则跳过,如果大于堆顶则替换掉堆顶进行siftdn调整,来找到新进元素的正确位置,以及产生新的堆顶。

建堆过程:可以自顶向下自底向上均可,以下采用自底向上思路分析。可以将数组的叶子节点,是单个结点满足二叉堆的定义,于是从底层叶子结点的父结点从左到右,逐个向上构建二叉堆,直到第一个节点时整个数组就是一个二叉堆,

这个过程是siftup和siftdn的混合,宏观上来看是自底向上,微观上每个父结点是自顶向下。

渗透排序过程:完成堆化之后,开处理N之后的元素,从N+1~200w,遇到比当前堆顶大的则与堆顶元素交换,进入堆触发siftdn调整,直至生产新的小根堆。

实例代码(验证AC):

题目leetCode 第215题 数组中的第K个最大元素,这道题可以用堆排序来完成,建立小根堆取堆顶元素即可。

 1 //leetcode 215th the Kth Num
 2 //Source Code:C++
 3 class Solution {
 4 public:
 5     //调整以当前节点为根的子树为小顶堆
 6     int heapadjust(vector<int> &nums,int curindex,int len){
 7         int curvalue = nums[curindex];
 8         int child = curindex*2+1;
 9         while(child<len){
10             //左右孩子中较小的那个
11             if(child+1<len && nums[child] > nums[child+1]){
12                 child++;
13             }
14             //当前父节点比左右孩子其中一个大
15             if(curvalue > nums[child]){
16                 nums[curindex]=nums[child];
17                 curindex = child;
18                 child = curindex*2+1; 
19             }else{
20                 break;
21             }
22         }
23         nums[curindex]=curvalue;
24         return 0;
25     }
26 
27     int findKthLargest(vector<int>& nums, int k) {
28         //边界条件
29         if(nums.size()<k)
30             return -1;
31         //建立元素只有K个的小顶堆
32         //截取数组的前k个元素
33         vector<int> subnums(nums.begin(),nums.begin()+k);
34         int len = nums.size();
35         int sublen = subnums.size();
36         //将数组的前k个元素建立小顶堆
37         for(int i=sublen/2-1;i>=0;i--){
38             heapadjust(subnums,i,sublen);
39         }
40         //建立好小顶堆之后 开始逐渐吸收剩余的数组元素
41         //动态与堆顶元素比较 替换
42         for(int j=k;j<len;j++){
43             if(nums[j]<=subnums[0])
44                 continue;
45             subnums[0] = nums[j];
46             heapadjust(subnums,0,sublen);
47         }
48         return subnums[0];  
49     }
50 };

上述代码中的heapadjust本质上就是siftdn函数。

5.总结:

网上有很多堆排序过程的图解,本文因此并没有过多重复这个过程,从实践来看,重点是初始化堆和调整堆两个过程,然而这两个过程都离不开siftup和siftdn两个函数,因此掌握这两个函数,基本上就掌握了堆。

由于堆是二叉树,因此在实际使用中需要结合树的遍历和循环来实现堆调整。掌握堆调整过程和二叉树遍历过程,拿下堆,指日可待。

6.参考资料:

  • 《编程珠玑》 第14章 堆

7.往期精彩:

浅析Redis 4.0新特性之LazyFree

理解Redis持久化

Linux中的各种锁及其基本原理

浅析CPython的全局解释锁GIL

浅谈Linux下Socket选项设置

深入理解IO复用之epoll理解Redis的反应堆模式

8.毛遂自荐

微信公众号:后端技术指南针

Guess you like

Origin www.cnblogs.com/backnullptr/p/11899666.html