The last link: implementation of heap + heap sorting
content
Analysis of the previous article Heap Sort
Proof of building small heaps in descending order
Small heap becomes large heap:
Analysis of the previous article Heap Sort
The last article introduced the binary tree notes and implemented a simple heap sort at the end:
ideas
First create a heap and use the properties of the top of the heap: select the largest or smallest
Use the property of removing the top element of the heap: find the next largest or the next smallest
sort the array
+
time and space complexity
The time complexity of insertion and deletion is O(logN), the worst case is the height of the binary tree
Because it is sequentially inserted and deleted, it is related to the number of nodes, so the time complexity of the sorting algorithm is O(N*logN)
The space complexity is O(N), because a heap needs to be created first, and array data is inserted, and the size is related to the size of the array
void HeapSort(int* a,int size)
{
HP hp;
HeapInit(&hp);
//时间复杂度O(N*logN)
for (int i = 0;i<size;i++)
{
HeapPush(&hp,a[i]);
}
HeapPrint(&hp);
int j = 0;
//时间复杂度O(N*logN)
while (!HeapEmpty(&hp))
{
a[j] = HeapTop(&hp);
j++;
HeapPop(&hp);
}
HeapDestroy(&hp);
}
Optimized heap sort
Optimization goal: time complexity O(N*logN)
Space complexity O(1)
Before, the heap was created first, and then the array was inserted. This time, we built the heap directly in the array to turn the array into a heap, so that the space complexity of the heap sorting algorithm is O(1)
There are two ways to build a heap in an array: adjust the heap up
Adjust heap build down
build heap up
In order to facilitate the explanation, the upward adjustment of the heap is shown here, taking building a small heap as an example
analyze
Directly in the array size is the number of array elements
When adjusting upwards , ensure that the tree ending with the starting node must be a heap
The first number is the top of the heap, starting from the second number and adjusting upwards
From front to back, adjust upwards in order
diagram
time complexity
Code
//向上调整
//建小堆为例
void Up(HPDataType* a,size_t child)
{
size_t parent = (child - 1) / 2;
while (child>0)
{
if (a[child] < a[parent])
{
swap(&a[child], &a[parent]);
child = parent;
parent = (child - 1) / 2;
}
else
{
break;
}
}
}
void HeapSort(int* a,int size)
{
//向上调整建堆
//分析后是件复杂度为O(N*logN)
for (int i = 1;i<size;i++)
{
Up(a,i);
}
}
Adjust heap build down
analyze
When adjusting downward , ensure that the left subtree and right subtree of the tree whose starting node is the top of the heap are the heap.
Adjust downward from the first non-leaf node as the top of the heap
Adjust from back to front
diagram
time complexity
Code
//建小堆为例
void Down(HPDataType* a, size_t parent, size_t size)
{
size_t child = parent * 2 + 1;
while (child < size)
{
if (child + 1 <size && a[child+1] < a[child])
{
child++;
}
if (a[child] < a[parent])
{
swap(&a[child],&a[parent]);
parent = child;
child = parent * 2 + 1;
}
else
{
break;
}
}
}
void HeapSort(int* a,int size)
{
//向下调整建堆
//分析后是件复杂度为O(N+logN)=O(N)
for (int i = (size-1-1)/2; i>=0; i--)
{
Down(a,i, size);
}
}
Summarize
The time complexity of adjusting the heap up is O(N*logN)Downward adjustment to build the heap
, the time complexity is O(N)Therefore, it is more efficient to use downward adjustment to build a heap
sort
build heap in ascending order
Build small heap in descending order
Thought analysis
1. Note: we just resize the array up or down to make it a heap
No functional interfaces such as deletion of the top element of the heap, insertion into the heap, etc. are created
Therefore, HeapTop into the array and HeapPop to delete the top of the heap cannot be used
2. Some friends may say that I can build these two function interfaces and use them again?
No, if you do this, you will inevitably open up a new array to put the top element of the heap into
The space complexity becomes O(N)
The originally built heap is swapped and deleted at the top and the end of the heap
does not meet our optimization goals
3. Then to make the space complexity O(1), you must sort in the original array
sorting ideas
1. Swap the elements of the top and end nodes of the heap
2. Adjust downward from the top of the heap for the first n-1 nodes of the heap
3. At this time, the top element of the heap is the largest or smallest element
4. Swap the top element of the heap with the n-1th element
5. Repeat the above process to complete ascending or descending order
Note: the subscript of the end node is updated
Proof of building small heaps in descending order
The same analysis can be done by building a large heap in ascending order.
Conclusion: build large heaps in ascending order and build small heaps in descending order
Code
Adjust the heap down first (high efficiency)
i is the subscript of the first non-leaf node
Record the last data subscript end
When end=1, end the swap and adjust down
Note: Adjust function parameters down
a is the starting address of the array
parent is the subscript of the parent node
size is the number of elements to adjust
void Down(HPDataType* a, size_t parent, size_t size);
In the following code, pay attention to the meaning of end
Before while represents the last element subscript
while represents the number of elements to be adjusted
void HeapSort(int* a,int size) { //升序建大堆 //降序建小堆 for (int i = (size-1-1)/2; i>=0; i--) { Down(a,i, size); } //最后一个数据的下标 size_t end = size - 1; while (end>0) { swap(&a[0],&a[end]); Down(a,0,end); end--; } }
This is how heap sort can be done.
To decide ascending or descending order, create a large heap or a small heap
Small heap becomes large heap:
You can change the greater than or less than sign when building the heap.
Comparison symbols for child and child+1, child and parent
TOP-K problem
What is the TOP-K problem?
That is, to find the top K largest elements or the smallest elements in the data combination , in general, the amount of data is relatively large.
For example: professional top 10, world top 500, rich list, top 100 active players in the game, etc.
For the Top-K problem, the simplest and most direct way I can think of is to sort
If the amount of data is very large (tens of G), sorting is not desirable, the memory will be very large, and the efficiency will be extremely low
The best way is to use heap sort to solve
ideas
1. Use the first K elements in the data set to build the first k largest elements of the heap, then build the first k smallest elements of
the small heap
, then build the large heap 2. Use the remaining NK elements to compare with the top elements of the heap in turnWhen the heap is small, the element larger than the top of the heap replaces the top of the heap
Adjust down to ensure the structure of the heap
For large heaps, replace the top of the heap with elements smaller than the top of the heap
Adjust down to ensure the structure of the heap
After comparing in turn, the heap is the largest or smallest top K elements in all data
Just traverse once
time complexity
The heap is established as K, and the number of NK remaining in the worst case must be adjusted
The number of adjustments is logK*(NK) times
O (K+logK*(NK))
The size of K is uncertain and cannot be omitted
space complexity
Only need to open up K spaces to build the heap
O(K)
Code
//TOP-K问题
void PrintTopK(int* a, int n, int k)
{
// 1. 建堆--用a中前k个元素建堆
int* kHeap = (int*)malloc(sizeof(int)*k);
if (kHeap == NULL)
{
printf("malloc fail\n");
exit(-1);
}
//将前k个数插入数组kHeap中
for (int i = 0;i<k;i++)
{
kHeap[i] = a[i];
}
//在数组里面建小堆
for (int i = (k - 1 - 1) / 2; i >= 0; i--)
{
Down(a, i, k);
}
// 2. 将剩余n-k个元素依次与堆顶元素交换,不满则则替换
for (int i = k;i<n;i++)
{
if (a[i]>kHeap[0])
{
kHeap[0] = a[i];
Down(kHeap,0,k);
}
}
// 3. 打印最大或最小的前k个
for (int j = 0;j<k;j++)
{
printf("%d ",kHeap[j]);
}
printf("\n");
free(kHeap);
}
random data test
Generate random numbers within 100000 and turn 10 random positions within 100000 into numbers larger than 100000
Find the ten largest numbers out of 10,000 numbers
run the code
void TestTopk() { int n = 10000; int* a = (int*)malloc(sizeof(int)*n); srand(time(0)); for (size_t i = 0; i < n; ++i) { a[i] = rand() % 1000000; } a[5] = 1000000 + 1; a[1231] = 1000000 + 2; a[531] = 1000000 + 3; a[5121] = 1000000 + 4; a[115] = 1000000 + 5; a[2335] = 1000000 + 6; a[9999] = 1000000 + 7; a[76] = 1000000 + 8; a[423] = 1000000 + 9; a[3144] = 1000000 + 10; PrintTopK(a, n, 10); } int main() { TestTopk(); return 0; }
operation result:
Can get maximum 10 numbers, but they are unordered
Add a sorter
//TOP-K问题
void PrintTopK(int* a, int n, int k)
{
// 1. 建堆--用a中前k个元素建堆
int* kHeap = (int*)malloc(sizeof(int)*k);
if (kHeap == NULL)
{
printf("malloc fail\n");
exit(-1);
}
//将前k个数插入数组kHeap中
for (int i = 0;i<k;i++)
{
kHeap[i] = a[i];
}
//在数组里面建小堆
for (int i = (k - 1 - 1) / 2; i >= 0; i--)
{
Down(a, i, k);
}
// 2. 将剩余n-k个元素依次与堆顶元素交换,不满则则替换
for (int i = k;i<n;i++)
{
if (a[i]>kHeap[0])
{
kHeap[0] = a[i];
Down(kHeap,0,k);
}
}
// 3. 排序
//最后一个数据的下标
size_t end = k - 1;
while (end>0)
{
swap(&kHeap[0], &kHeap[end]);
Down(kHeap, 0, end);
end--;
}
// 4. 打印排序后的前k个
for (int j = 0;j<k;j++)
{
printf("%d ",kHeap[j]);
}
printf("\n");
free(kHeap);
}
operation result
The notes on heap sorting and TOP-K issues are over here. All partners are welcome to exchange comments in the comment area, please like, please, like! ! !