Heap sort TopK problem

1. Implementation of heap-related function interfaces

insert image description here

The heap is a complete binary tree, which is divided into two structures: large heap and small heap. Large
heap: any parent node is greater than or equal to the child node. The above picture is a large heap.
Small heap: any parent node is less than or equal to the child node
in order What is the relationship between the subscripts of the parent node and the child node in the table ?
leftchild=parent * 2+1
rightchild=parent * 2+2
Then, you can also get
the following table of the parent node is (child node subscript -1)/2
mainly uses the sequence table to store the heap. This data structure
mainly completes the following the functional interface of

#pragma once

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
#include<stdbool.h>

typedef int HPDataType;
typedef struct Heap
{
    
    
	HPDataType* a;
	int capcity;
	int size;
}Heap;

void HeapInit(Heap* php);
void HeapDestroy(Heap* php);
void HeapPush(Heap* php, HPDataType x);
void HeapPop(Heap* php);
HPDataType HeapTop(Heap* php);
bool HeapEmpty(Heap* php);
int HeapSize(Heap* php);
void  HeapCreate(Heap* php,HPDataType* a, int size);
void AdjustDown(HPDataType* a, int size, int parent);
void AdjustUp(HPDataType* a, int child);

1. Heap initialization

The initialization is very simple is to initialize the sequence table used to store the heap

void HeapInit(Heap* php)
{
    
    
	assert(php);
	php->a = NULL;
	php->capcity = 0;
	php->size = 0;
}

2. Destruction of the heap

void HeapDestroy(Heap* php)
{
    
    
	assert(php);
	free(php->a);
	php->a = NULL;
	php->capcity = 0;
	php->size = 0;
}

3. Insert

insert image description here
Now to insert a node, it must be linked to the position of the left subtree of 6.
If the value of the inserted node is less than 6, it will not affect the structure of the heap, and it is still a large heap.
If the value of the inserted node is greater than 6, then the structure of the heap will be affected at this time, making it not a large heap, so we will adjust this node upwards. And during the adjustment process, only the relative position of the node to be adjusted and its parent node will be changed until it conforms to the structure of the heap.
insert image description here
Let's first use diagrams to show the following adjustment process: adjust upwards in turn,
insert image description here
and determine whether expansion is required before inserting.

void HeapPush(Heap* php, HPDataType x)
{
    
    
	assert(php);
	if (php->capcity == php->size)
	{
    
    
		int newcapcity = (php->capcity == 0 ? 4 : php->capcity * 2);
		HPDataType* tmp = (HPDataType*)realloc(php->a, sizeof(HPDataType) * newcapcity);
		if (!tmp)
		{
    
    
			perror("realloc fail");
			exit(-1);
		}
		php->a = tmp;
		php->capcity = newcapcity;
	}
	php->a[php->size] = x;
	php->size++;
	AdjustUp(php->a, php->size - 1);
}

4. Adjust upward

void AdjustUp(HPDataType* a, int child)
{
    
    
	int parent = (child - 1) / 2;
	while (child > 0)
	{
    
    
		if (a[child] > a[parent])
		{
    
    
			swep(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
    
    
			break;
		}
	}
}

5. Delete

When deleting a node, usually the node at the top of the heap is deleted, because if the heap is large, the data at the top of the heap is the largest, otherwise it is the smallest, and deleting other nodes is meaningless.
There are also tricks when deleting. If we directly move the following data forward to overwrite the first data, the entire heap will become unordered, so we use this method: first exchange the top element
with the last element, Then size -
At this time, the left subtree and right subtree at the top of the heap must still be a large heap and have not been destroyed, so we need to adjust downward, similar to the upward adjustment, until it becomes a large heap structure.

Pay special attention when adjusting downwards. If the parent node is smaller than the child node, then the parent node must be exchanged with the larger one of the left and right child nodes.

insert image description here

void HeapPop(Heap* php)
{
    
    
	assert(php);
	assert(!HeapEmpty(php));
	//首尾交换
	swep(&php->a[0], &php->a[php->size - 1]);
	php->size--;
	//向下调整
	AdjustDown(php->a, php->size, 0);
}

6. Adjust down

void AdjustDown(HPDataType* a, int size, int parent)
{
    
    
	int child = parent * 2 + 1;
	while (child < size)
	{
    
    
		//如果右子树大于左子树那么孩子结点就选右子树
		if (child + 1 < size && a[child + 1] > a[child])
		{
    
    
			child++;
		}
		if (a[child] > a[parent])
		{
    
    
			swep(&a[child], &a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}
}

7. Build piles

The above heap building mode is to insert and build heaps one by one, and the efficiency is relatively low.
Let me introduce two heap building methods, which are based on upward adjustment and downward adjustment.

The first is to adjust the pile up
and the second is to adjust the pile down

Adjust upward to build a heap, based on the fact that before inserting this data, it was originally a large heap
. Given an array, start adjusting upwards from the subscript 1 (because when there is only one element, it is a large heap or a small heap), until the last one element.

The downward adjustment of the heap is based on the fact that the left and right words are all large piles
insert image description here
, as shown in the above figure. For 6, the left and right subtrees are not large piles, so we cannot start adjusting from 6, but we can go from bottom to top. The number of words left and right of 5 is a lot, so adjust 5 first, then adjust 9 and 6 in turn.

void  HeapCreate(Heap* php, HPDataType* a, int size)
{
    
    
	assert(php);
	php->a = (HPDataType*)malloc(sizeof(HPDataType) * size);
	if (!php->a)
	{
    
    
		perror("malloc fail");
		exit(-1);
	}
	php->capcity = size;
	php->size = size;
	memcpy(php->a, a, sizeof(HPDataType) * size);
	//向上调整建堆
	/*for (int i = 1; i < size; i++)
	{
		AdjustUp(php->a, i);
	}*/
	//向下调整建堆
	for (int i = (size - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(php->a, size, i);
	}
}

8. Take the top of the pile

HPDataType HeapTop(Heap* php)
{
    
    
	assert(php);
	assert(!HeapEmpty(php));
	return php->a[0];
}

9. Empty judgment

bool HeapEmpty(Heap* php)
{
    
    
	assert(php);
	return php->size == 0;
}

10, the size of the heap

int HeapSize(Heap* php)
{
    
    
	assert(php);
	return php->size;
}

Second, the time complexity of building up the heap and building down the heap

The two methods of building heaps are introduced above, so which method should we use in use?
It must be the one with high efficiency, so let's calculate the time complexity of the following two methods now

adjust down
insert image description here

adjust up
insert image description here

It can be seen that downward adjustment of pile building is better than upward adjustment of pile building.

Three, heap sort

The idea of ​​heap sorting is to adjust the heap downwards according to the given array (build a large heap in ascending order, and build a small heap in descending order). Taking ascending order
as an example, after the heap is successfully built, the top element of the heap is the largest data in the array. Exchange with the last data of the heap, so that the largest data is placed at the end, and then size--reduce the data of the heap by one, so that the subsequent operations will not affect the largest data selected in the previous step, and then adjust the construction downward The stack is then operated sequentially.
insert image description here

void HeapSort(int* a, int size)
{
    
    
	//升序建大堆
	for (int i = (size - 2) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(a, size, i);
	}
	int end = size - 1;
	while (end > 0)
	{
    
    
		swep(&a[0], &a[end]);
		AdjustDown(a, end, 0);
		end--;
	}
}

The calculation formula of the time complexity of heap sorting is similar to the formula of upward adjustment,
time complexity: O(N*logN) space complexity O(1)

Four, TopK problem

There are two ways of thinking about the TopK problem:
we take the selection of the largest K number as an example
1, build a large heap of all data, take the top of the heap each time, delete the top of the heap, and adjust downward.
However, the disadvantage of this idea is that when the amount of data given is huge and cannot be stored in the memory, it will not be able to operate.
2. Create a small heap with K nodes. Since the top of the small heap is the smallest data, then traverse the entire data as long as it is larger than the top of the heap, then enter the heap, and then adjust downwards until the traversal is complete, the largest K The numbers are all in the heap.
This way of thinking does not need to worry about the memory problem, and the data can be read from the file if the amount of data is large.
So let me explain the idea

void TopK(int* a, int size, int k)
{
    
    
	int* minHeap = (int*)malloc(sizeof(int) * k);
	if (!minHeap)
	{
    
    
		perror("malloc fail");
		exit(-1);
	}
	int j = 0;
	for (j = 0; j < k; j++)
	{
    
    
		minHeap[j] = a[j];
	}
	//建k个结点的堆
	for (int i = (k - 2) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(minHeap, k, i);
	}
	for (; j < size; j++)
	{
    
    
		if (a[j] > minHeap[0])
		{
    
    
			minHeap[0] = a[j];
			AdjustDown(minHeap, k, 0);
		}
	}
	for (int i = 0; i < k; i++)
	{
    
    
		printf("%d ", minHeap[i]);
	}
	
}

Time complexity of idea 2: O (N*logK)
Time complexity of idea 1: O (N)

Guess you like

Origin blog.csdn.net/Djsnxbjans/article/details/128049964