"Heap sort" and "Top-k"

Table of contents

Edit

Foreword:

About "Heap Sort":

Step 1: Build a pile

Step 2: Sort

"Top-K Problem"

Regarding Top-k questions:


Foreword:

In the previous blog, we already have a preliminary concept of "Heap", then we can use "Heap" to solve the problems in our daily life. In this article, we give two commonly used application scenarios, respectively. It is "Sort" and "Top-k Problem". The previous blog is in: Simulation Implementation of "Heap"-CSDN Blog

 

About "Heap Sort":

#define _CRT_SECURE_NO_WARNINGS  1
#include<stdio.h>

void swap(int* a, int* b)
{
	int tmp = *a;
	*a = *b;
	*b = tmp;
}

void AdjustDown(int* arr, int sz, int parent)
{
	int child = parent * 2 + 1;
	while (child < sz)
	{
		if (child + 1 < sz && arr[child] < arr[child + 1])
		{
			child++;
		}

		if (arr[child] > arr[parent])
		{
			swap(&arr[child], &arr[parent]);
			parent = child;
			child = 2 * parent + 1;
		}
		else
		{
			break;
		}
	}
}

void AdjustUp(int* arr, int sz, int child)
{
	while (child > 0)
	{
		int parent = (child - 1) / 2;
		if (arr[parent] < arr[child])
		{
			swap(&arr[parent], &arr[child]);
		}
		child = parent;
	}
}

int main()
{
	int arr[] = { 2, 6, 9, 3, 1, 7 };
	int sz = sizeof(arr) / sizeof(arr[0]);
	for (int i = (sz - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDown(arr, sz, i);
	}//向下调整算法
	

	//for (int i = 1; i<sz; i++)
	//{
	//	AdjustUp(arr, sz, i);
	//}//向上调整算法

	int end = sz - 1;
	while (end > 0)
	{
		swap(&arr[0], &arr[end]);
		AdjustDown(arr, end, 0);
		--end;
	}
	return 0;
}

Step 1: Build a pile

Using "Heap" can facilitate us to sort a given out-of-order array. First, we should select a large heap for sorting operation.

Why don't we choose to use a small heap to build a heap?

According to the previous blog explanation of "Heap", a small heap means that the top element is the smallest element, and the number of other nodes is smaller than the first element. So if it is a small heap, the smallest number is already the first element. If you want To find the next smallest element, you need to build a heap among the remaining elements and repeat the cycle to complete the sorting. This has high time complexity and is not conducive to sorting.

Therefore, we choose to use a large heap to build a heap. After realizing a large heap, we exchange the first and last elements, and then use the downward adjustment method to adjust the remaining n-1 elements, and then exchange them. This can achieve Sort.

    int arr[] = { 2, 6, 9, 3, 1, 7 };
	int sz = sizeof(arr) / sizeof(arr[0]);

	for (int i = 1; i<sz; i++)
	{
	   AdjustUp(arr, sz, i);
	}

Adjust the array as shown in the figure upward to build a heap:

 

Step 2: Sort

 First we swap the first and last elements:

Adjust downwards on all but the last element to continue into a large pile

 

 

 

 

Repeat the above steps

The final heap is:

 

This completes the heap sort.

 

"Top-K Problem"

Regarding Top-k questions:

That is to find the first K largest elements or smallest elements in data combination. Generally, the amount of data is relatively large .

For example: top 10 professionals, Fortune 500, rich list, top 100 active players in the game, etc. Let's take the example of finding the first K largest elements in n data to illustrate: (assuming n=10000) (assuming k=10)

 

#define _CRT_SECURE_NO_WARNINGS 1
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
const char* file = "data.txt";
void swap(int* a, int* b)
{
	int tmp = *a;
	*a = *b;
	*b = tmp;
}

void AdjustDown(int* arr, int sz, int parent)
{
	int child = 2 * parent + 1;
	while (child < sz)
	{
		if (child + 1 < sz && arr[child + 1] < arr[child])
		{
			child++;
		}

		if (arr[child] < arr[parent])
		{
			swap(&arr[child], &arr[parent]);
			parent = child;
			child = 2 * parent + 1;
		}
		else
		{
			break;
		}
	}
}

void CreateFile()
{
	//创建随机数的种子
	srand((unsigned int)time(NULL));
	FILE* Fin = fopen(file, "w");
	if (Fin == NULL)
	{
		perror("Fopen error");
		exit(-1);
	}

	int n = 10000000;
	for (int i = 0; i < n; i++)
	{
		int x = (rand() + i) % n;
		fprintf(Fin, "%d\n", x);
	}

	fclose(Fin);
	Fin = NULL;
}

void Print()
{
	FILE* Fout = fopen(file, "r");
	if (Fout == NULL)
	{
		perror("Fout error");
		exit(-1);
	}

	//取前k个数进小堆
	int* minheap = (int*)malloc(sizeof(int) * 5);
	if (minheap == NULL)
	{
		perror("minheap -> malloc");
		return;
	}


	for (int i = 0; i < 5; i++)
	{
		fscanf(Fout, "%d", &minheap[i]);
	}

	for (int i = (5-1-1)/2; i >=0; --i)
	{
		AdjustDown(minheap, 5, i);
	}

	//读取数据
	int x = 0;
	while (fscanf(Fout, "%d", &x) != EOF)
	{
		if (minheap[0] < x)
		{
			minheap[0] = x;
		}
		AdjustDown(minheap, 5, 0);
	}

	for (int i = 0; i < 5; i++)
	{
		printf("%d ", minheap[i]);
	}

	fclose(Fout);
	Fout = NULL;
}

int main()
{
	//CreateFile();
	Print();
	return 0;
}

First, we create 10000000 random numbers, then modify the numbers, randomly select 5 numbers, and modify them as

10000001,10000002,10000003,10000004,10000005

Build another small pile . Note that this must be a small pile!

If we build a large heap, if the data is searched first and 10000005 is found, then the number must be at the top of the heap. When we find the next smallest number, we cannot enter the heap, so we use a small heap!

 Then put the first 5 elements of the data into the small heap,

Then traverse and compare the remaining 9999995 numbers. If they are greater than the top element of the heap, replace them directly.

After the replacement, adjust downward again. After traversing the entire data, the heap will be inserted.

 10000001,10000002,10000003,10000004,10000005

Guess you like

Origin blog.csdn.net/weixin_72917087/article/details/135478100