TopK problem (solved with heap)

Insert image description here
Let's continue our TopK problem above. TopK problem is generally solved when there are many numbers. Our k is a and a small value. Then we need to find the minimum or maximum K number. We also solve this kind of problem. It’s called the TopK problem. Faced with this kind of problem, if the number is not very large, we can write a loop, then directly insert it into the heap and adjust it upward to solve the problem. But now that the number is very large, we Inserting data will consume a lot of money, so here we only create storage of k sizes to build a heap. Here we can store the data in our files so that we can observe it.

First of all, the problem we give here is to find the largest K numbers among so much data. The first thing that will bother us is whether we should build a large pile or a small pile. The answer is小堆

Let’s first analyze why

First of all, if we build a large heap, if the first number that enters the heap is the largest number, then what we will face here is that all subsequent numbers cannot enter the heap. Do we need to be larger than this heap to enter the heap? This is just like our sewer pipe is blocked and nothing can get in, so here is a small heap. Then what we need here is to malloc a space of k size and build the heap first, because we talked about it above Over-building the heap can be done by adjusting downward, so here we complete this step first.

The code is as follows.

// 建一个k个数小堆
	int* minheap = (int*)malloc(sizeof(int) * k);
	if (minheap == NULL)
	{
    
    
		perror("malloc error");
		return;
	}

	// 读取前k个,建小堆
	for (int i = 0; i < k; i++)
	{
    
    
		fscanf(fout, "%d", &minheap[i]);
		AdjustUp(minheap, i);
	}

Then the rule we give is that if the number at the top of the heap is larger than the number, we will directly overwrite it, and the number at the top of the heap will directly become this large number. Then we will adjust downward and wait for all the numbers to be processed. After comparing, we can also take out all our numbers.

	while (fscanf(fout, "%d", &x) != EOF)
	{
    
    
		if (x > minheap[0])
		{
    
    
			minheap[0] = x;
			AdjustDown(minheap, k, 0);
		}
	}

That's because we write it in a file, so another knowledge point we have here is file operations.

We have learned how to read and write files before, and we also know that to read and write a file, we must first open our file. If it is written, it will be created, and it will be cleared first (if there is such a file, no Just create it)

After is opened, we have to use the two functions fscanf and fprintf. Then you can check the document. In fact, it is and There are differences between scanf and printf.

Insert image description here

Insert image description here
You can recall this after reading this. If you forget, you can read the previous article to review. The name of the article is 文件操作.

Then we will read and write the article, and then operate and complete the code.

#define _CRT_SECURE_NO_WARNINGS 



#include<stdio.h>
#include<time.h>


void CreateNDate()
{
    
    
	// 造数据
	int n = 10000000;
	srand(time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");
	if (fin == NULL)
	{
    
    
		perror("fopen error");
		return;
	}

	for (int i = 0; i < n; ++i)
	{
    
    
		int x = (rand() + i) % 10000000;
		fprintf(fin, "%d\n", x);
	}

	fclose(fin);
}

void PrintTopK(const char* file, int k)
{
    
    
	FILE* fout = fopen(file, "r");
	if (fout == NULL)
	{
    
    
		perror("fopen error");
		return;
	}

	// 建一个k个数小堆
	int* minheap = (int*)malloc(sizeof(int) * k);
	if (minheap == NULL)
	{
    
    
		perror("malloc error");
		return;
	}

	// 读取前k个,建小堆
	for (int i = 0; i < k; i++)
	{
    
    
		fscanf(fout, "%d", &minheap[i]);
		AdjustUp(minheap, i);
	}

	int x = 0;
	while (fscanf(fout, "%d", &x) != EOF)
	{
    
    
		if (x > minheap[0])
		{
    
    
			minheap[0] = x;
			AdjustDown(minheap, k, 0);
		}
	}

	for (int i = 0; i < k; i++)
	{
    
    
		printf("%d ", minheap[i]);
	}
	printf("\n");

	free(minheap);
	fclose(fout);
}

int main()
{
    
    
	CreateNDate();
	PrintTopK("Data.txt", 5);

	return 0;
}

Of course, we still need to pay attention to some things here. For example, how do we know which numbers are the largest? The random numbers we generate here are and is interesting.

Insert image description here
Ensure that these numbers are all 0-9999999. Adding i here also has an effect. It mainly depends on our srand function to generate a certain random number. After that, it will only generate the same random number. The advantage here is that the random number generated later is Adding i, because i is always changing, we only need to change a few numbers larger than 1,000,000 in the generated random numbers to know whether our code is correct. We can directly use array subscripting. Of course, You can change it in the open file, you can view it anyway, and there is our debugging window.

That concludes our content for today. See you next time.

Insert image description here

Guess you like

Origin blog.csdn.net/2301_76895050/article/details/134652982
Recommended