Data structure: application of heap (heap sort and topk problem)

insert image description here

Personal homepage: Personal homepage
Personal column: "Data Structure" "C Language"


heap sort

Heap sorting is to first build a heap of data, and then use the idea of ​​​​heap deletion to sort.

  1. Build a heap of the array to be sorted
  2. Exchange the data at the top of the heap with the data at the end of the array
  3. Adjust the new heap top data so that the structure of the heap remains unchanged

Repeat steps 2 and 3 until there is no data in the heap.

build pile

  • Build a small heap in descending order (the parent node is less than or equal to the child node)
  • Build a large heap in ascending order (the parent node is greater than or equal to the child node)

There are two ways of building a heap, building up a heap and building a heap downward. Among them, building a pile downward is better than building a pile upward.

Build the heap downward: starting from the parent node of the last child node, it traverses the array to be sorted forward, and continuously adjusts downwards.
As follows: Build a small heap for the array {16, 72, 31, 94, 53, 23}
insert image description here
Why can't it start from the first element of the array? Because the premise of downward adjustment is that the left subtree and right subtree of the root node are both large or small heaps. Empty trees and trees with only one node can be large or small heaps.

Heap delete thought sorting

  • Swap the top of the heap with the end of the unsorted array
  • Adjust the new heap top data downwards to ensure that the heap structure remains unchanged
  • Exchange the data at the end of the new unsorted array with the data at the top of the new heap

Repeat the above steps to complete the sorting.
It can also explain why large heaps are built in ascending order and small heaps are built in descending order. The top data of the small heap is always the smallest data in the heap, exchange the top data with the tail of the unsorted array, and repeat the above steps. The smallest data is the last element of the array, and the second smallest data is the second-to-last element of the number... So the descending order is completed.

as follows

insert image description here

Code

//向下调整 小堆,假设该节点是 i, 右孩子节点是 2 * i + 1,左孩子节点是 2 * i + 2
void AdjustDown(HPDataType* data, int parent, int size)
{
    
    
	int child = parent * 2 + 1;

	while (parent < size)
	{
    
    
		//防止越界                    找左右孩子中最小的
		if (child + 1 < size && data[child] > data[child + 1])
		{
    
    
			child++;
		}

		if (child < size && data[parent] > data[child])
		{
    
    
			swap(&data[parent], &data[child]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}
}


// 对数组进行堆排序
//先构建堆    升序:大堆     降序:小堆
//如降序,先建小堆,再将堆顶数据放入数组尾部,从新选择堆顶数据
void HeapSort(int* a, int n)
{
    
    
	建堆
	向上建堆   类似于插入数据
	//for (int i = 0; i < n; i++)
	//{
    
    
	//	AdjustUp(a, i);
	//}

	//向下建堆   向下调整的前提:该节点的左右子树要都是大堆或小堆
	//倒着从第一个非叶子结点开始向下建堆
	//             n 是数据个数 n-1 是数组最后一个元素   (子节点 - 1) / 2 == 父节点
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(a, i, n);
	}


	//将堆顶数据交换数组尾部数据,再选新的堆顶,再交换新的数组尾
	int end = n - 1;
	while (end > 0)
	{
    
    
		swap(&a[0], &a[end]);
		AdjustDown(a, 0, end);
		end--;
	}
}

int main()
{
    
    
	int arr[] = {
    
     16, 72, 31, 23, 94, 53 };
	int size = sizeof(arr) / sizeof(arr[0]);

	HeapSort(arr, size);
	for (int i = 0; i < size; i++)
	{
    
    
		printf("%d ", arr[i]);
	}
	printf("\n");
}

top k questions

The top k problem is to select the top K numbers from the N numbers (N is much larger than K)
as follows: We randomly create 10,000 numbers less than 1,000,000, and find the 5 largest numbers from them

train of thought

We can first build a small heap with the first 5 numbers, and then traverse 9995 numbers. If the number is greater than the number at the top of the heap, replace the number with the number at the top of the heap, and then adjust downward to ensure the structure of the small heap, and continue to traverse the rest Count until 9995 numbers are traversed. Then the 5 numbers in the heap are the largest 5 numbers in 10000.

Code

How to check the correctness of the code?
We can first run the code for creating data, and then randomly rewrite 5 numbers in the created file to make it greater than 1,000,000. Then we can shield the function of creating data to run the PrintTopK function.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void CreateNDate()
{
    
    
	// 造数据
	int n = 10000;
	srand((unsigned)time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");
	if (fin == NULL)
	{
    
    
		perror("fopen error");
		return;
	}

	for (int i = 0; i < n; ++i)
	{
    
    
		int x = rand() % 1000000;
		fprintf(fin, "%d\n", x);
	}

	fclose(fin);
}




//从N个数中选处最大的K个数
//用前K个数建小堆(向下调整 or 向上调整),遍历N - K 个数,  (如果是大堆,那么有可能堆顶数据在一开始就是 N 个数中最大的)
//如果该数大于堆顶数据,堆顶数据 与 该数 交换在向下调整。
//遍历完 N - K 个数,那么堆中数据就是 N 个数中最大的 K 个数

void swap(int* a, int* b)
{
    
    
	int tmp = *a;
	*a = *b;
	*b = tmp;
}

//小堆  父节点小于等于子节点
void AdjustDown(int* data, int parent, int size)
{
    
    
	int child = parent * 2 + 1;

	while (parent < size)
	{
    
    

		if (child + 1 < size && data[child] > data[child + 1])
		{
    
    
			child++;
		}

		if (child < size && data[parent] > data[child])
		{
    
    
			swap(&data[child], &data[parent]);

			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}
}

void PrintTopK(int k)
{
    
    
	const char* file = "data.txt";
	FILE* fin = fopen(file, "r");

	//读取前K个数据
	int* ans = (int*)malloc(sizeof(int) * (k + 1));
	if (ans == NULL)
	{
    
    
		perror("malloc:");
		exit(-1);
	}

	for (int i = 0; i < k; i++)
	{
    
    
		fscanf(fin, "%d", &ans[i]);
	}

	//建堆
	for (int i = (k - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(ans, i, k);
	}

	while (!feof(fin))
	{
    
    	
		//读取数据
		int val = 0;
		fscanf(fin, "%d", &val);

		if (val > ans[0])
		{
    
    
			swap(&val, &ans[0]);
			AdjustDown(ans, 0, k);
		}
	}

	
	//打印数据
	for (int i = 0; i < k; i++)
	{
    
    
		printf("%d ", ans[i]);
	}
	printf("\n");
}


int main()
{
    
    
	CreateNDate();

	int k = 0;
	scanf("%d", &k);
	PrintTopK(k);
	return 0;
}

Summarize

The above is my understanding of the application of the heap! ! !
insert image description here

Guess you like

Origin blog.csdn.net/li209779/article/details/132236113