Implementation of heap sorting of eight sorting algorithms + classic TopK problem

Table of contents

1. Up and down adjustment interface of heap elements

1 Introduction

2. Heap element upward adjustment algorithm interface

3. Heap element downward adjustment algorithm interface

2. Implementation of heap sort

1. Heap sorting with a space complexity of O(N) (taking ascending order as an example)

Idea analysis:

Code:

Sort test:

​Time and space complexity analysis:

2. Heap sort with a space complexity of O(1) (taking descending order as an example)

The idea of ​​adjusting the array arr into a heap:

Time complexity analysis of adjusting the array arr into a heap:​

The idea of ​​sorting is completed on the basis that the array arr array is adjusted into a heap

Heap sort code implementation:

Sorting time and space complexity analysis:

3. Solve the TopK problem with the heap data structure

1. Problem description:

 2. Problem analysis and solution

 


1. Up and down adjustment interface of heap elements

1 Introduction

The physical and logical structure of a complete binary tree:

For the design principle analysis of heap and heap element adjustment algorithm interface, please refer to Qingcai's blog http://t.csdn.cn/MKzyt http://t.csdn.cn/MKzytQingcai friendly reminder: If you want to understand heap sorting deeply, Must master the construction of the heap

Note: The next two interfaces are for the element adjustment algorithm interface of the small root heap . If you need to use the large root heap data structure , you only need to compare the child and parent node values ​​in the element adjustment algorithm interface of the small root heap. Changing the direction can be used to realize a large root pile .

2. Heap element upward adjustment algorithm interface

Function header:

void AdjustUp(HPDataType* arry, size_t child)  //child表示孩子结点的编号

HPDataType is the data type defined by typedef , arry is a pointer to the heap array , child is the number of the node to be adjusted in the complete binary tree (physically its array subscript)

  •  Algorithm call scenario: 

Interface implementation:

//元素交换接口
void Swap(HPDataType* e1, HPDataType* e2)
{
	assert(e1 && e2);
	HPDataType tem = *e1;
	*e1 = *e2;
	*e2 = tem;
}



//小堆元素的向上调整接口
void AdjustUp(HPDataType* arry, size_t child)  //child表示待调整的结点的编号
{
	assert(arry);
	size_t parent = (child - 1) / 2;           //找到child结点的父结点
	while (child > 0)						   //child减小到0时则调整结束(说明待调整结点被调整到了根结点位置)
	{
		if (arry[child] < arry[parent])        //父结点大于子结点,则子结点需要上调以保持小堆的结构
		{
			Swap(arry + child, arry+parent);
			child = parent;				//将原父结点作为新的子结点继续迭代过程
			parent = (child - 1) / 2;	//继续向上找另外一个父结点
		}
		else
		{
			break;						//父结点不大于子结点,则堆结构任然成立,无需调整
		}
	}
}
  • The end of the cycle is divided into two cases:
  1. When the child is reduced to 0, it means that the node to be adjusted has been adjusted to the position of the root node (the data structure of the small root heap is restored)
  2. If in a parent-child node comparison , if the value of the parent node is greater than that of the child node , it means that the data structure of the small root heap is restored , just break out of the loop
  • The premise of calling this interface is: the upper layer structure of the node to be adjusted ( including the layer of the node to be adjusted, but not including the node to be adjusted itself ) satisfies the data structure of the small root heap , for example: otherwise, the adjustment of the heap elements It will be meaningless (because only when the above prerequisites are met , the upper layer structure of the node to be adjusted will maintain the data structure of the small root heap after calling this interface each time , and the node to be adjusted is the upper layer of the leaf node structure becomes a heap )
  • The element upward adjustment algorithm interface of the large root heap :
  • If you want to change the interface to the upward adjustment algorithm interface of large root heap elements , just change the less than sign in the red circle in the above figure to the greater sign.

3. Heap element downward adjustment algorithm interface

Function header:

void AdjustDown(HPDataType* arry,size_t size,size_t parent)

HPDataType is the data type defined by typedef , arry is a pointer to the first address of the heap array , size is the total number of elements in the heap , parent is the number of the node to be adjusted in the complete binary tree (physically its array subscript)

  •  Algorithm call scenario: 

Interface implementation:

//元素交换接口
void Swap(HPDataType* e1, HPDataType* e2)
{
	assert(e1 && e2);
	HPDataType tem = *e1;
	*e1 = *e2;
	*e2 = tem;
}

//小堆元素的向下调整接口
void AdjustDown(HPDataType* arry,size_t size,size_t parent)
{
	assert(arry);
	size_t child = 2 * parent + 1;   //确定父结点的左孩子的编号
	while (child < size)			 //child增加到大于或等于size时则调整结束
	{
		if (child + 1 < size && arry[child + 1] < arry[child]) //确定左右孩子中较小的孩子结点
		{
			++child;
		}
		if ( arry[child] < arry[parent])//父结点大于子结点,则子结点需要上调以保持小堆的结构
		{
			Swap(arry + parent, arry + child);
			parent = child;				//将原子结点作为新的父结点继续迭代过程
			child = 2 * parent + 1;		//继续向下找另外一个子结点
		}
		else
		{
			break;						//父结点不大于子结点,则堆结构任然成立,无需调整
		}
	}
}
  •  Some boundary conditions that the algorithm needs to pay attention to:
  1. child >= size indicates that the adjusted element has been swapped to the position of the leaf node , the data structure of the small root heap is restored , and the loop is terminated
  2. In the interface, we only designed a child variable to represent the child node number of the current parent node , so we need to determine which node value is smaller among the left and right children , and make child equal to the number of the smaller child node :
    if (child + 1 < size && arry[child + 1] > arry[child]) //确定左右孩子中较小的孩子结点
    {
    	++child;
    }

    The child + 1<size judgment statement is to determine whether the right child of the current parent node exists ;

  • The premise of calling this interface is: the left and right subtrees of the node position to be adjusted satisfy the data structure of the small root heap , for example: otherwise, the adjustment of heap elements will be meaningless (because only when the above premise is met , every time After calling this interface , the left and right subtrees of the node to be adjusted will maintain the data structure of the small root heap , and the subtree with the node to be adjusted as the root node will become a heap )

  • The interface of the downward adjustment algorithm for the elements of the big root heap :

  • To realize the downward adjustment algorithm interface of the elements of the large root heap , we only need to change the two less than signs in the red circle above to greater than signs. 

For the implementation principle analysis of the heap element up and down adjustment algorithm interface , see: http://t.csdn.cn/MKzyt http://t.csdn.cn/MKzyt

2. Implementation of heap sort

With the up and down adjustment algorithm interface of the heap elements , we can use the data structure of the heap to implement an efficient sorting algorithm .

Now we are given an array of one hundred elements (with a random value attached to each element):

typedef int HPDataType;
int main()
{
	int arr[100] = { 0 };
	srand((unsigned int)time(NULL));
	for (int i = 0; i < 100; i++)
	{
		arr[i] = rand() % 10000;        //数组每个元素赋上一个随机值
	}
	
	return 0;
}

Heap sort function interface:

void HeapSort(int * arr,int size);

 arr is a pointer to the first address of the array to be sorted , and size is the number of elements in the array to be sorted

1. Heap sorting with a space complexity of O(N) (taking ascending order as an example)

Idea analysis:

  • One of the very violent ways to implement heap sorting is:
  1. In the HeapSort interface, dynamically create a Heap array with the same size as the array to be sorted as a heap
  2. Then insert the elements of the array to be sorted into the Heap array one by one, and at the same time call the heap element upward adjustment algorithm to adjust the position of the heap tail elements to build the heap (sorting in ascending order creates a small root heap)
  3. After the heap building process is completed , take out the heap top data one by one ( according to the method of deleting the top elements of the heap, see the implementation of the heap http://t.csdn.cn/vhbJf for details ) ( the heap top data is the smallest element in the heap ) The sorting can be completed by covering the space of the array to be sorted from the first address of the array to be sorted

Sorting algorithm diagram:

  • First insert the elements in arr one by one into the Heap array to build a heap
  • Then put the top elements of the Heap array back into the arr array one by one by using the top element deletion operation to complete the ascending sort (the principle is that the top element of the small root heap is always the smallest element in the heap ) ( the top element deletion operation refers to The most important thing is: first exchange the top element with the heap tail element , maintain the subscript pointer at the end of the heap minus one (the number of heap elements minus one), and then adjust the top element down to restore the small root heap data structure ):

Code:

//元素交换接口
void Swap(HPDataType* e1, HPDataType* e2)
{
	assert(e1 && e2);
	HPDataType tem = *e1;
	*e1 = *e2;
	*e2 = tem;
}



//小堆元素的向上调整接口
void AdjustUp(HPDataType* arry, size_t child)  //child表示待调整的结点的编号
{
	assert(arry);
	size_t parent = (child - 1) / 2;           //找到child结点的父结点
	while (child > 0)						   //child减小到0时则调整结束(说明待调整结点被调整到了根结点位置)
	{
		if (arry[child] < arry[parent])        //父结点大于子结点,则子结点需要上调以保持小堆的结构
		{
			Swap(arry + child, arry+parent);
			child = parent;				//将原父结点作为新的子结点继续迭代过程
			parent = (child - 1) / 2;	//继续向上找另外一个父结点
		}
		else
		{
			break;						//父结点不大于子结点,则堆结构任然成立,无需调整
		}
	}
}

//小堆元素的向下调整接口
void AdjustDown(HPDataType* arry,size_t size,size_t parent)
{
	assert(arry);
	size_t child = 2 * parent + 1;   //确定父结点的左孩子的编号
	while (child < size)			 //child增加到大于或等于size时则调整结束
	{
		if (child + 1 < size && arry[child + 1] < arry[child]) //确定左右孩子中较小的孩子结点
		{
			++child;
		}
		if ( arry[child] < arry[parent])//父结点大于子结点,则子结点需要上调以保持小堆的结构
		{
			Swap(arry + parent, arry + child);
			parent = child;				//将原子结点作为新的父结点继续迭代过程
			child = 2 * parent + 1;		//继续向下找另外一个子结点
		}
		else
		{
			break;						//父结点不大于子结点,则堆结构任然成立,无需调整
		}
	}
}


void HeapSort(int* arr, int size)
{
	assert(arr);
	int* Heap = (int*)malloc(size * sizeof(int));
	assert(Heap);
	int ptrarr = 0;		//维护arr数组的下标指针
	int ptrheap = 0;	//维护Heap数组的下标指针
	//逐个尾插元素建堆
	while (ptrarr < size)
	{
		Heap[ptrheap] = arr[ptrarr]; //将arr数组中的元素逐个尾插到Heap数组中
		AdjustUp(Heap, ptrheap);     //每尾插一个元素就将该元素向上调整保持小堆的数据结构
		ptrheap++;
		ptrarr++;

	}
	//逐个将堆顶的元素放回arr数组(同时进行删堆操作)
	ptrarr = 0;
	int HeapSize = size;
	while (ptrarr < size)
	{
		Swap(&Heap[0], &Heap[HeapSize - 1]);  //交换堆顶和堆尾的元素
		arr[ptrarr] = Heap[HeapSize-1];		  //将原堆顶元素插入arr数组中
		HeapSize--;                           //堆元素个数减一(完成堆数据弹出)
		ptrarr++;                             //维护arr的下标指针+1
		AdjustDown(Heap, HeapSize, 0);        //将交换到堆顶的数据向下调整恢复堆的数据结构
	}
}

Sort test:

int main()
{
	int arr[100] = { 0 };
	srand((unsigned int)time(NULL));
	for (int i = 0; i < 100; i++)
	{
		arr[i] = rand() % 10000;        //数组每个元素赋上一个随机值
	}

	HeapSort(arr, 100);
	for (int i = 0; i < 100; ++i)
	{
		printf("%d ", arr[i]);
	}
	
	return 0;
}

Time and Space Complexity Analysis:

  • Since the time complexity of tail insertion and heap deletion is O(NlogN) , the time complexity of sorting is O(NlogN)
  • Obviously, an additional Heap array is opened up in the HeapSort interface , and the space complexity of sorting is O(N)
  • For the time complexity proof of building and deleting heaps, see Qingcai’s blog: http://t.csdn.cn/MKzyt http://t.csdn.cn/MKzyt
  • This kind of heap sorting has a large amount of code , a large amount of data concurrency , and high space complexity . Next, we will implement an optimal heap sorting algorithm

2. Heap sort with a space complexity of O(1) (taking descending order as an example)

In the previous heap sorting algorithm, the Heap array was introduced to build the heap , which wasted a lot of space.

In fact, we can complete the construction of the heap on the array to be sorted ( that is, adjust the array arr into a heap ).

The idea of ​​adjusting the array arr into a heap:

  • There is an out-of-order array arr, logically we regard it as a complete binary tree:
  • Next, we try to use the elements of the heap to adjust the algorithm interface downwards to adjust arr into a small root heap 
  1. The premise of calling the heap element downward adjustment interface is: the left and right subtrees of the node position to be adjusted all satisfy the data structure of the small root heap (because in the case of satisfying this premise, each time we call this interface , the to-be-adjusted The left and right subtrees of the node position will maintain the data structure of the small root heap , and the subtree with the node to be adjusted as the root node will become a heap)
  2. From the above premise, it is meaningless to adjust the heap from the top of the heap (or any node in the middle) , so we can only adjust the heap from the substructure at the end of the heap :
  3. Through the analysis of the above figure, we can find the first node to be adjusted down through the heap tail element , and then start from the first node to be adjusted down and adjust other nodes forward and down until the completion After the downward adjustment of the root node of the tree , the entire complete binary tree will be adjusted into a heap:
  4. Tuning stack small animation:
  • Implement the code to adjust the arr array into a small root heap:
    void HeapSort(int* arr, int size)
    {
    	assert(arr);
    	int parent = (size - 1 - 1) / 2;	//找到第一个要被调整向下调整的元素
    	for (; parent >= 0; --parent)
    	{
    		AdjustDown(arr, size, parent);  //逐个元素向下调整完成堆的构建
    	}
    }

Time complexity analysis of adjusting the array arr into a heap: 

Therefore, assuming that there are N elements in the arr array , the time complexity of adjusting the array arr into a heap is: O(N)

The idea of ​​sorting is completed on the basis that the array arr array is adjusted into a heap

  • After the array arr is adjusted into a small root heap , we only need to delete the top elements of the heap one by one to complete the descending order of all numbers ( the element at the top of the heap is the most value in the heap )
  • The heap element deletion operation refers to: first exchange the top element of the heap with the heap tail element , maintain the subscript pointer at the end of the heap minus one (the number of heap elements minus one), and then adjust the top element of the heap downward to restore the small root heap data Structure ( to ensure that the top element of the heap is always the most value in the heap ) )
  • Diagram of the process of deleting the top elements of the heap one by one to complete the descending order:
  • The whole sorting process is actually equivalent to selecting the data at the top of the heap (the most value in the heap) and exchanging it to the end of the heap every time , so heap sorting is a kind of selection sorting
  • From the above algorithm design ideas, we can see that in order to complete the heap sort, we only need to design an additional heap element downward adjustment algorithm interface

Heap sort code implementation:

//元素交换接口
void Swap(HPDataType* e1, HPDataType* e2)
{
	assert(e1 && e2);
	HPDataType tem = *e1;
	*e1 = *e2;
	*e2 = tem;
}

//小堆元素的向下调整接口
void AdjustDown(HPDataType* arry,size_t size,size_t parent)
{
	assert(arry);
	size_t child = 2 * parent + 1;   //确定父结点的左孩子的编号
	while (child < size)			 //child增加到大于或等于size时则调整结束
	{
		if (child + 1 < size && arry[child + 1] < arry[child]) //确定左右孩子中较小的孩子结点
		{
			++child;
		}
		if ( arry[child] < arry[parent])//父结点大于子结点,则子结点需要上调以保持小堆的结构
		{
			Swap(arry + parent, arry + child);
			parent = child;				//将原子结点作为新的父结点继续迭代过程
			child = 2 * parent + 1;		//继续向下找另外一个子结点
		}
		else
		{
			break;						//父结点不大于子结点,则堆结构任然成立,无需调整
		}
	}
}

void HeapSort(int* arr, int size)
{
	assert(arr);
	int parent = (size - 1 - 1) / 2;	//找到第一个要被调整向下调整的元素
	for (; parent >= 0; --parent)
	{
		AdjustDown(arr, size, parent);  //逐个元素向下调整完成堆的构建
	}

	while (size > 0)					//逐个删除堆顶元素完成降序排序,我们将size作为堆尾指针
	{
		Swap(&arr[0], &arr[size - 1]);  //交换堆尾与堆顶元素
		size--;							//堆尾指针减一,堆元素个数减一
		AdjustDown(arr, size, 0);       //将堆顶元素向下调整恢复小根堆数据结构
	}
}

Sort interface tests: 

int main()
{
	int arr[100] = { 0 };
	srand((unsigned int)time(NULL));
	for (int i = 0; i < 100; i++)
	{
		arr[i] = rand() % 10000;      //数组每个元素赋上一个随机值
	}

	HeapSort(arr, 100);
	for (int i = 0; i < 100; ++i)
	{
		printf("%d ", arr[i]);
	}
	
	return 0;
}

Sorting time and space complexity analysis:

  • The time complexity of deleting the top elements of the heap one by one until the heap is emptied is O(NlogN). For proof and analysis, please refer to Qingcai’s blog: http://t.csdn.cn/vhbJf http://t.csdn.cn/vhbJf
  • It is known that the time complexity of adjusting the arr array into a heap is O(N), so the overall time complexity of heap sorting is O(NlogN)
  • At the same time, it is easy to know that the space complexity of the heap sort algorithm is O(1)
  • It can be seen that heap sorting is an efficient selection sorting algorithm

3. Solve the TopK problem with the heap data structure

The TopK problem refers to selecting K most values ​​from an array of N elements . (K<=N)

There are related question types on Leetcode.

Interview Question 17.14. Minimum K Number - Leetcode

1.  Problem description:

Design an algorithm to find the smallest k numbers in the array . The k numbers can be returned in any order. (The number of array elements is arrSize)

(k<=arrSize)

Example:

Input: arr = [1,3,5,7,2,4,6,8], k = 4
Output: [1,2,3,4]

Solution interface: 

int* smallestK(int* arr, int arrSize, int k, int* returnSize)
{

}

arrSize is the number of elements in the array , k is the number of the minimum number to be found , and returnSize is the number of elements in the result array

 2. Problem analysis and solution

  • This problem can be solved theoretically if the arr array is directly sorted , but the time efficiency is slightly low (O(NlogN)) , and it feels like killing a chicken with a sledgehammer
  • We can consider using the heap data structure to achieve one of the optimal solutions to this problem:
  1. First create a k*sizeof(int) byte-sized array Heap to store the large root heap
  2. Then insert the first k elements in arr into Heap to build a large root heap
  3. Then compare the last (arrSize-k) elements in arr with the elements at the top of the Heap one by one . If one of the last (arrSize-k) elements in arr is smaller than the element at the top of the Heap , compare it with the top of the Heap Element exchange , and then adjust it downward to maintain the data structure of the big root heap ( elements are exchanged into the heap )
  4. After completing the traversal comparison between the last (arrSize-k) elements in arr and the top of the Heap heap, the last thing left in the heap is the smallest k elements in the arr array

Algorithm Diagram: 

Proof of the rationality of the algorithm:

  • Since the top element of the big root heap is the largest element in the heap , the elements that are not put into the heap during the comparison of the last (arrSize-k) elements in arr with the top of the Heap heap must be greater than the k elements in the heap , so the k elements in the final heap must be the smallest k elements in the arr array

Solution code:

void Swap(int* e1 ,int* e2)
{
    int tem = *e1;
    *e1 = *e2;
    *e2 = tem;
}

//大堆元素的向上调整接口
void AdjustUp(int * arry, size_t child)     //child表示待调整结点的编号
{
	assert(arry);
	size_t parent = (child - 1) / 2;
	while (child > 0)						//child减小到0时则调整结束
	{
		if (arry[child] > arry[parent])     //父结点小于子结点,则子结点需要上调以保持大堆的结构
		{
			Swap(arry + child, arry+parent);
			child = parent;				    //将原父结点作为新的子结点继续迭代过程
			parent = (child - 1) / 2;	    //继续向上找另外一个父结点
		}
		else
		{
			break;						    //父结点不小于子结点,则堆结构任然成立,无需调整
		}
	}
}


//大堆元素的向下调整接口
void AdjustDown(int * arry,size_t size,size_t parent)
{
	assert(arry);
	size_t child = 2 * parent + 1;   //确定父结点的左孩子的编号
	while (child < size)			 //child增加到大于或等于size时则调整结束
	{
		if (child + 1 < size && arry[child + 1] > arry[child]) //确定左右孩子中较大的孩子结点
		{
			++child;
		}
		if ( arry[child] > arry[parent])//父结点小于子结点,则子结点需要上调以保持大堆的结构
		{
			Swap(arry + parent, arry + child);
			parent = child;				//将原子结点作为新的父结点继续迭代过程
			child = 2 * parent + 1;		//继续向下找另外一个子结点
		}
		else
		{
			break;						//父结点不小于子结点,则堆结构任然成立,无需调整
		}
	}
}

int* smallestK(int* arr, int arrSize, int k, int* returnSize)
{
    if(0==k)
    {
        *returnSize =0;
        return NULL;
    }
    int * Heap = (int*)malloc(k*sizeof(int));
    *returnSize = k;                     //创建一个空间大小为k的数组用于存储堆
    int ptrHeap =0;                      //维护堆尾的指针
    while(ptrHeap<k)                     //将arr数组前k个元素尾插到Heap中完成建堆
    {
        Heap[ptrHeap]=arr[ptrHeap];
        AdjustUp(Heap,ptrHeap);    
        ptrHeap++;
    }


    int ptrarr = k;             //用于遍历arr中后(arrSize-k)个元素的下标指针
    while(ptrarr < arrSize)     //将arr中后(arrSize-k)个元素逐个与Heap堆顶的元素进行比较
    {
        //如果找到arr中后(arrSize-k)个元素中比堆顶元素小的元素则将该元素替换入堆
        //并通过堆元素向下调整接口保持大根堆的数据结构
        if(Heap[0]>arr[ptrarr])
        {
            Swap(&Heap[0],&arr[ptrarr]);
            AdjustDown(Heap,k,0);
        }
        ptrarr++;
    }
    return Heap;                  //返回Heap数组作为及结果
}

Algorithm space-time complexity analysis:

Let the number of elements in the array arr be N

  • The time complexity of creating a Heap array heap is O(klogk)
  • The time complexity of comparing the (Nk) elements after arr with the top elements of the heap and merging them into the heap is O((Nk)logk) (in the worst case , each of the (Nk) elements after arr is exchanged into Heap and adjusted to the leaf node position of the heap )
  • So the overall time complexity of the algorithm is O(Nlogk)
  • It is easy to know that the space complexity of the algorithm is O(k)

The idea of ​​solving the TopK problem has very important practical significance :

For example, there are one billion data in the hard disk, and we want to select 100 minimum values , then using the above algorithm idea, we can complete this work with very little memory consumption and extremely high time efficiency .

Guess you like

Origin blog.csdn.net/weixin_73470348/article/details/129325295