[Data structure] heap implementation, heap sorting and TOP-K problem

Table of contents

1. The concept and structure of the heap

2. Implementation of the heap

2.1 Initialize the heap

2.2 Destroy the heap

2.3 Take the top element of the heap

2.4 Return the size of the heap

2.5 Judging whether it is empty

2.6 Print Heap

2.7 Inserting elements

2.8 Upscaling of the heap

2.9 Popup elements

2.10 Downward adjustment of the heap

3. Heap building time complexity

4. Application of the heap

4.1 Heap sort

4.2 TOP-K problem


1. The concept and structure of the heap

A heap is a data structure that consists of a set of elements that are sorted and accessed according to certain rules. A heap can be viewed as a complete binary tree in which the value of each node is greater than or equal to its child node (for a max-heap) or less than or equal to its child node (for a min-heap) . Heaps are often used to solve prioritized problems, such as finding the largest or smallest element.

 Properties of the heap:

  • The value of a node in the heap is always not greater than or not less than the value of its parent node;
  • The heap is always a complete binary tree.

2. Implementation of the heap

What is written here is the small root pile, and the big root pile can be slightly modified on the basis of the small root pile. The following are some interface functions to be implemented by the heap:

//初始化堆
void HeapInit(HP* php);
//销毁堆
void HeapDestory(HP* php);
//插入元素
void HeapPush(HP* php, HPDataType x);
//堆向上调整算法
void AdjustUp(HP* php, int x);
//弹出堆顶元素
void HeapPop(HP* php);
//堆向下调整算法
void AdjustDwon(HPDataType* a, int size, int x);
//取堆顶元素
HPDataType HeapTop(HP* php);
//返回堆的大小
int HeapSize(HP* php);
//判断是否为空
bool HeapEmpty(HP* php);
//打印堆
void HeapPrint(HP* php);

Heap definition:

typedef int HPDataType;
typedef struct Heap
{
	HPDataType* a;
	int size;
	int capacity;
}HP;

For some simple interface functions, we will not introduce them in detail. In the heap, what we mainly need to learn is the upward adjustment algorithm and the downward adjustment algorithm. These two functions are called when elements are inserted and when elements are popped up, respectively.

2.1 Initialize the heap

void HeapInit(HP* php)
{
	assert(php);
	php->a = NULL;
	php->size = php->capacity = 0;
}

2.2 Destroy the heap

void HeapDestory(HP* php)
{
	assert(php);
	free(php->a);
	php->a = NULL;
	php->size = php->capacity = 0;
}

2.3 Take the top element of the heap

HPDataType HeapTop(HP* php)
{
	assert(php);
	return php->a[0];
}

2.4 Return the size of the heap

int HeapSize(HP* php)
{
	assert(php);
	return php->size;
}

2.5 Judging whether it is empty

bool HeapEmpty(HP* php)
{
	assert(php);
	return php->size == 0;
}

2.6 Print Heap

void HeapPrint(HP* php)
{
	assert(php);
	for (int i = 0; i < php->size; i++)
	{
		printf("%d ", php->a[i]);
	}
	printf("\n");
}

2.7 Inserting elements

Insert an element into the heap, we can insert this element to the end of the heap, because the actual storage structure of the heap is an array, we can put the element at the end of the array, but if it is only inserted at the end of the array, the heap will be To break the loop, we also need to call an upward adjustment function to adjust the size relationship between each node.

Before inserting, it is necessary to judge whether the capacity of the heap is sufficient. If the capacity of the heap is full, it needs to be expanded. Here, each expansion is actually doubled on the original basis.

void HeapPush(HP* php, HPDataType x)
{
	assert(php);
	if (php->size == php->capacity)
	{
		int newCapacity = php->capacity == 0 ? 4 : php->capacity * 2;
		HPDataType* tmp = (HPDataType*)realloc(php->a, sizeof(HPDataType) * newCapacity);
		if (tmp == NULL)
		{
			printf("realloc fail\n");
			exit(-1);
		}
		php->a = tmp;
		php->capacity = newCapacity;
	}

	php->a[php->size] = x;
	AdjustUp(php->a, php->size);//向上调整
	php->size++;
}

2.8 Upscaling of the heap

In the above process of inserting elements, we have used the upward adjustment algorithm of the heap. Next, let's take a look at how to implement this upward adjustment algorithm:

First insert a 10 to the end of the array, and then perform an upward adjustment algorithm until the heap is satisfied.

Graphical process:

void AdjustUp(HPDataType* a, int x)
{
	int child = x;
	int parent = (child - 1) / 2;
	while (child > 0)
	{
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
		}
		else
		{
			break;
		}
		child = parent;
		parent = (child - 1) / 2;
	}
}

Code analysis: 

  1. Initialize the variable child as node x, and parent as the index of its parent node, that is, (child - 1) / 2.
  2. Enter a loop that will be executed until node x floats up to a suitable position or reaches the top of the heap.
  3. In the loop, judge whether the value of node x is less than the value of its parent node, and if so, exchange the values ​​of the two.
  4. If the value of node x is not less than the value of the parent node, jump out of the loop, because the nature of the heap is satisfied at this time.
  5. Update the values ​​of child and parent, update child to parent, and update parent to the index of its parent node, that is, (child - 1) / 2.
  6. Repeat steps 3-5 until the value of node x is greater than or equal to the value of its parent node, or the top of the heap is reached.

2.9 Popup elements

Popping an element is to delete the element at the top of the heap, but we cannot delete it directly, which will destroy the structure of the heap. The correct way is to first exchange the element at the top of the heap with the last element, so that the first element is guaranteed The left subtree and right subtree are still in the form of a heap, and then the size is reduced, and finally a heap downward adjustment function is called.

void HeapPop(HP* php)
{
	assert(php);
	Swap(&php->a[0], &php->a[php->size-1]);
	php->size--;
	AdjustDwon(php->a, php->size, 0);
}

2.10 Downward adjustment of the heap

Downward adjustment of the heap: Each time the smaller value of the parent node and the left and right children is exchanged (small root heap), the value of the child node of the parent node is continuously updated.

void AdjustDwon(HPDataType* a, int size, int x)
{
	int parent = x;
	int child = parent * 2 + 1;
	while (child < size)
	{
		if (child + 1 < size && a[child + 1] < a[child])
		{
			child++;
		}
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
		}
		else
		{
			break;
		}
		parent = child;
		child = parent * 2 + 1;
	}
}
  1. Initialize the variable parent as node x, and child as the index of its left child node, that is, parent * 2 + 1.
  2. Enter a loop, which will be executed until the node x sinks to a suitable position or has no child nodes.
  3. In the loop, first judge whether the node x has a right child node, and the value of the right child node is less than the value of the left child node, if true, update child to the index of the right child node.
  4. Then judge whether the value of node x is greater than the value of its child nodes, and if so, exchange the values ​​of the two.
  5. If the value of the node x is not greater than the value of the child node, jump out of the loop, because the nature of the heap is satisfied at this time.
  6. Update the values ​​of parent and child, update parent to child, and update child to the index of the left child node of parent, that is, parent * 2 + 1.
  7. Repeat steps 3-6 until the value of node x is less than or equal to the value of its children, or has no children.

3. Heap building time complexity

Because the heap is a complete binary tree, and the full binary tree is also a complete binary tree, here we use a full binary tree to prove it for simplicity (the time complexity is originally an approximation, and a few more nodes will not affect the final result):

Adjust down:

Therefore: the time complexity of adjusting the heap down is O(N).

Adjust upwards:

 Therefore: the time complexity of adjusting the heap upwards is N*logN;

4. Application of the heap

4.1 Heap sort

Use the heap to sort the array and print it out:

void testHeapSort()
{
	HP hp;
	HeapInit(&hp);

	int a[] = { 1,4,7,5,10,2,8,9,3,6 };
	for (int i = 0; i < sizeof(a) / sizeof(a[0]); i++)
	{
		HeapPush(&hp, a[i]);
	}
	while (!HeapEmpty(&hp))
	{
		printf("%d ", HeapTop(&hp));
		HeapPop(&hp);
	}
	//释放内存
	HeapDestory(&hp);
}
int main()
{
	testHeapSort();
	return 0;
}

Output result:

 However, isn't it a bit complicated to use this method? If we want to perform heap sorting, we must first write a heap data structure. Of course, this is not the case. We can modify the code and build a heap on the original array:

Ideas:

For building a heap on the original array, we can use two methods:

The first is to build up the heap. The time complexity of building up the heap is O(N*logN). We do not recommend this method.

The second is to build a heap downward, its time complexity is O(N), and its efficiency is higher than that of building a heap upward. We recommend using down-heap building.

Another point that is more difficult to understand is: if we want to perform ascending order, we need to build a large heap, and if we want to perform descending order, we need to build a small heap.

void swap(int* x, int* y)
{
	int tmp = *x;
	*x = *y;
	*y = tmp;
}
void HeapSort(int* a, int n)
{
	//从倒数第一个非叶子节点开始调
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDwon(a, n, i);//向下调整建堆
	}
	int end = n - 1;
	while (end > 0)
	{
		swap(&a[0], &a[end]);
		AdjustDwon(a, end, 0);//向下调整[0,end]的元素
		--end;
	}
}
int main()
{
	int a[] = { 1,4,7,5,10,2,8,9,3,6 };
	int n = sizeof(a) / sizeof(a[0]);
	HeapSort(a,n);//堆排序
	for (int i = 0; i < n; i++)
	{
		printf("%d ", a[i]);
	}
	return 0;
}

4.2 TOP-K problem

TOP-K problem: Find the top K largest elements or smallest elements in the data combination. Generally, the amount of data is relatively large.

For example: the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc.

For the Top-K problem, the most simple and direct way that can be thought of is sorting, but: if the amount of data is very large, sorting is not advisable (it may not be possible to load all the data into memory at once). The best way is to use the heap to solve it. The basic idea is as follows:

1. Use the first K elements in the data set to build a heap

  • For the first k largest elements, build a small heap
  • For the first k smallest elements, build a large heap

2. Use the remaining NK elements to compare with the top elements in turn, and replace the top elements if they are not satisfied

After comparing the remaining NK elements with the top elements of the heap in turn, the remaining K elements in the heap are the first K smallest or largest elements sought.

Practical application: Find the top ten largest numbers among 10000000 random numbers

void AdjustDwon(int* a, int size, int x)
{
	int parent = x;
	int child = parent * 2 + 1;
	while (child < size)
	{
		if (child + 1 < size && a[child + 1] < a[child])
		{
			child++;
		}
		if (a[child] < a[parent])
		{
			int tmp = a[child];
			a[child] = a[parent];
			a[parent] = tmp;
		}
		else
		{
			break;
		}
		parent = child;
		child = parent * 2 + 1;
	}
}

void PrintTopK(int* a, int n, int k)
{
	int* KMaxHeap = (int*)malloc(sizeof(int) * k);
	assert(KMaxHeap);
	for (int i = 0; i < k; i++)
	{
		KMaxHeap[i] = a[i];
	}
	//建小根堆
	for (int i = (k - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDwon(KMaxHeap, k, i);
	}
	//依次比较a数组中剩余的元素
	for (int i = k; i < n; i++)
	{
		if (a[i] > KMaxHeap[0])
		{
			KMaxHeap[0] = a[i];
		}
		AdjustDwon(KMaxHeap, k, 0);
	}
	//打印
	for (int i = 0; i < k; i++)
	{
		printf("%d ", KMaxHeap[i]);
	}
}
void testTopK()
{
	srand(time(0));
	int n = 10000000;
	int* a = (int*)malloc(sizeof(int) * n);
	for (int i = 0; i < n; i++)
	{
		a[i] = rand() % n;//a[i]的范围[1,n]
	}
	//手动设定10个最大的数
	a[2] = n + 3;
	a[122] = n + 5;
	a[1233] = n + 1;
	a[12333] = n + 2;
	a[1322] = n + 8;
	a[2312] = n + 6;
	a[54612] = n + 7;
	a[546612] = n + 9;
	a[5612] = n + 10;
	a[46612] = n + 4;
	PrintTopK(a, n, 10);
}
int main()
{
	testTopK();
	return 0;
}

Guess you like

Origin blog.csdn.net/m0_73648729/article/details/132268305