[Data structure -- C language] Heap realizes the Top-K problem. It turns out that the ranking of the glory of the king is realized in this way, and the knowledge has increased again.

Table of contents

1. What is a Top-K problem?

1.1 Basic idea of ​​Top-K

2. Logical analysis of Top-K problems

2.1 Build a heap, a small heap of size K

2.2 Compare the remaining N - K elements with the top elements in turn, and replace them if they are greater

2.3 Print Heap

3. TopK implementation code

4. The complete code of the Top-K problem

The result display:


The introduction of the TopK question:
Everyone has encountered a hero number xxx in xxx city and a hero xxx number xxx in xxx district when playing King of Glory. Or if we want to eat a certain food when we order takeaway today, we open Meituan/Ele.me, choose the option closest to us or the option with the highest score, and the top x names of the store you choose will be sorted out in order . Top 10 in the Forbes list, top 5 in the Hurun rich list, etc. These problems require sorting a large amount of data and selecting the largest top K ones. Here, the TopK algorithm is used to solve this type of problem.

1. What is a Top-K problem?

TOP-K problem: Find the top K largest elements or smallest elements in the data combination. Generally, the amount of data is relatively large.
For example: the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc.

For the Top-K problem, the most simple and direct way that can be thought of is sorting, but: if the amount of data is very large, sorting is not advisable (it may not be possible to
load all the data into memory at once). The best way is to use the heap to solve it. The basic idea is as follows:

1.1 Basic idea of ​​Top-K

(1) Use the first K elements in the data set to build a heap.
        If it is the first k largest elements, build a small heap.
        If it is the first k smallest elements, build a large heap.

(2) Use the remaining NK elements to compare with the top elements in turn (we are here to find the largest K as an example), we are building a small heap, so the top element of the heap is the smallest in this small heap, so we start from The remaining NK elements are first compared with the top of the heap. If it is greater than the top element of the heap, replace the top element of the heap, and adjust downward to rebuild the small heap. If it is smaller than the top element of the heap, do not replace it and let the next element Compared with the top of the heap, the remaining NK elements are compared in turn, and this step is repeated.
(3) After comparing the remaining NK elements with the top elements of the heap in turn, the remaining K elements in the heap are the first K smallest or largest elements sought.

2. Logical analysis of Top-K problems

(1) First use the first K to build small heaps;

(2) Compare the remaining N - K elements with the top elements in turn, and replace if they are larger;

(3) Print heap.

This is our big logic, we will analyze these three steps step by step:

2.1 Build a heap, a small heap of size K

process:

1. We first open up a space of size k;

2. Adjust the first K data downwards to form a small pile. (For those who don’t understand how to adjust downwards, you can click here to review )

code show as below:

int* kminheap = (int*)malloc(sizeof(int) * k);
if (NULL == kminheap)
{
    perror("malloc fail:");
    return;
}

for (int i = 0; i < k; i++)
{
    fscanf(fout, "%d", &kminheap[i]);
}

//建小堆
for (int i = (k - 1 - 1) / 2; i >= 0; i--)
{
    AdjustDown(kminheap, k, i);
}

2.2 Compare the remaining N - K elements with the top elements in turn, and replace them if they are greater

process:

1. Because we are the top K largest data, we build a small heap. The top element of the small heap is the smallest element in the heap, and the remaining NK elements are compared with the top of the heap in turn;

2. If this element is larger than the top of the heap, we will let it replace the top element of the heap, if it is smaller than that, we will not exchange it, and go to the following elements in turn to compare;

3. If it is exchanged, start from the top of the heap and adjust downward to rebuild the heap, and the top of the heap will be the smallest element again;

4. When the NK elements are compared sequentially, the K elements in the heap are the first K largest elements to be found.

code show as below:

int val = 0;
while (!feof(fout))
{
    fscanf(fout, "%d", &val);
    if (val > kminheap[0])
    {
        kminheap[0] = val;
        AdjustDown(kminheap, k, 0);
    }
}

2.3 Print Heap

for (int i = 0; i < k; i++)
{
    printf("%d ", kminheap[i]);
}

3. TopK implementation code

void PrintTopK(int k)
{
	const char* file = "data.txt";
	FILE* fout = fopen(file, "r");
	if (NULL == fout)
	{
		perror("fopen error:");
		return;
	}

		int* kminheap = (int*)malloc(sizeof(int) * k);
		if (NULL == kminheap)
		{
			perror("malloc fail:");
			return;
		}

		for (int i = 0; i < k; i++)
		{
			fscanf(fout, "%d", &kminheap[i]);
		}

		//建小堆
		for (int i = (k - 1 - 1) / 2; i >= 0; i--)
		{
			AdjustDown(kminheap, k, i);
		}

	int val = 0;
	while (!feof(fout))
	{
		fscanf(fout, "%d", &val);
		if (val > kminheap[0])
		{
			kminheap[0] = val;
			AdjustDown(kminheap, k, 0);
		}
	}

	for (int i = 0; i < k; i++)
	{
		printf("%d ", kminheap[i]);
	}
	printf("\n");

}

Our code here reads data from the file, and we store the prepared data in the file.

4. The complete code of the Top-K problem

We are first creating 1000 numbers, storing the numbers in a file, and then taking these numbers from the file when seeking Top-K.

void CreateNData()
{
	//造数据
	int n = 1000;
	srand((unsigned int)time(NULL));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");
	if (NULL == fin)
	{
		perror("fopen error:");
		return;
	}

	for (size_t i = 0; i < n; i++)
	{
		int x = rand() % 100000;
		fprintf(fin, "%d\n", x);
	}

	fclose(fin);
}
void PrintTopK(int k)
{
	const char* file = "data.txt";
	FILE* fout = fopen(file, "r");
	if (NULL == fout)
	{
		perror("fopen error:");
		return;
	}

		int* kminheap = (int*)malloc(sizeof(int) * k);
		if (NULL == kminheap)
		{
			perror("malloc fail:");
			return;
		}

		for (int i = 0; i < k; i++)
		{
			fscanf(fout, "%d", &kminheap[i]);
		}

		//建小堆
		for (int i = (k - 1 - 1) / 2; i >= 0; i--)
		{
			AdjustDown(kminheap, k, i);
		}

	int val = 0;
	while (!feof(fout))
	{
		fscanf(fout, "%d", &val);
		if (val > kminheap[0])
		{
			kminheap[0] = val;
			AdjustDown(kminheap, k, 0);
		}
	}

	for (int i = 0; i < k; i++)
	{
		printf("%d ", kminheap[i]);
	}
	printf("\n");

}

The result display:

Quick verification technique : We have written it in the file here. In order to quickly verify whether the code is written correctly, we call the interface that generates the data, then comment it out, enter the data.txt file and change the five largest data, and then Go to print, so that you can quickly verify.

Compare the two pictures, and print out the top 5 largest values.

*** End of article ***

Guess you like

Origin blog.csdn.net/Ljy_cx_21_4_3/article/details/131068681