Talking about the topk problem

topk problem

Find the largest (smallest) top k from the N numbers.
Realistic scenes such as: the ranking of heroes of the glory of the king (the 100th Hanxin in Chang'an District), and the ranking of outside merchants (the second in sales of Maocai in Chang'an District)

1. Idea 1: Sorting (heap sorting)

The time complexity is: O(N*log 2 N )

If we want to find the largest top k numbers, we can sort the N numbers in descending order, and then output the top k numbers; in descending order, we have to build a small heap, and then combine the root node and the last leaf node Exchange, do not regard the last number as data in the heap, and then make downward adjustments; refer to heap sorting for specific analysis

1.1 Code implementation

//降序
void HeapSort(int* a, int n)
{
    
    
	//建小堆
	for (int i = (n - 1 - 1) / 2; i >= 0; --i)
	{
    
    
		AdjustDown(a, n, i);
	}
	int end = n - 1;
	//把最小的换到最后一个位置,不把最后一个数看作堆里的
	//每次选出剩下数中最小的
	//从后往前放
	while (end > 0)
	{
    
    
		int tem = a[end];
		a[end] = a[0];
		a[0] = tem;
		AdjustDown(a, end, 0);
		--end;
	}
}

1.2 test

int main()
{
    
    
	int arr[] = {
    
     27, 28, 65, 25, 15, 34, 19, 49, 18, 37 };
	int n = sizeof(arr) / sizeof(int);
	printf("全部数据:");
	for (int i = 0; i < n; i++)
	{
    
    
		printf("%d ", arr[i]);
	}
	printf("\n");
	HeapSort(arr, n);
	printf("最大的前5个数:");
	for (int j = 0;j < 5; j++)
	{
    
    
		printf("%d ", arr[j]);
	}
	printf("\n");
	return 0;
}

Insert picture description here

2. Idea 2: Build a large pile, delete the largest one, adjust the pile and continue to select, until the k second-largest ones are selected

The time complexity is: O(N+k*log 2 N )
When N is very large, the time complexity of this idea approaches O(N)

Because the first idea is not efficient, the second idea is obtained after optimization-------> If we want to find the largest top k numbers, we can build a large pile, then delete the root node (maximum value), and then Use the downward adjustment algorithm to select the second largest number, and then delete it until k second largest numbers are selected

This idea needs to build a heap time complexity of O(N), and perform k times downward adjustment algorithm time complexity is O(k*log 2 N ), so the total time complexity of this idea is O(N+klog 2 N ), when N is very large, log 2 N is very flat, and O(N) approaches infinity, so when N is very large, the time complexity of this idea approaches O(N)

3. Idea three -> optimal solution (saving space and relatively high efficiency) :

The time complexity is: O(k+(Nk)*log 2 k )
When N is very large, the time complexity of this idea approaches O(N)

If N is very large, the number of N can not fit in the memory. Idea 2 is not feasible, and optimization is obtained. Idea 3-------> If we want to find the largest top k numbers, build a small heap of K numbers, The remaining NK numbers are compared with the data at the top of the heap in turn. If it is larger than the data at the top of the heap, the data at the top of the heap is replaced (directly overwritten), and then adjusted downwards, so that the largest number of K at the end of the cycle Pile

This idea builds k-number heaps, the time complexity is O(k), and NK downward adjustments are required, so the time complexity is O((Nk)*log 2 k ), so the total time complexity of the idea three is O(k+(Nk)*log 2 k ), in practical applications, N is very large, and k is a very small value compared to N, so when N is very large, the time complexity of idea 3 also approaches O(N)

3.1 Code implementation

void TopK(int* a,int k,int n)
{
    
    
	//开辟k个大小的空间
	int* heapA = (int*)malloc(k * sizeof(int));
	for (int i = 0; i < k; ++i)
	{
    
    
		heapA[i] = a[i];
	}
	//建k个数大小的小堆
	for (int i = (k - 2) / 2; i >= 0; --i)
	{
    
    
		AdjustDown(heapA, k, i);
	}
	//从第K个位置开始和heapA[0]比较,大就覆盖heapA[0],
	//然后继续调整,heapA[0]始终是堆里最小的数
	for (int i = k; i < n; i++)
	{
    
    
		if (a[i] > heapA[0])
		{
    
    
			heapA[0] = a[i];
			AdjustDown(heapA, k, 0);
		}
	}
	//打印
	for (int i = 0; i < k; i++)
	{
    
    
		printf("%d ", heapA[i]);
	}
	printf("\n");
	free(heapA);
}

3.2 Test

int main()
{
    
    
	int arr[] = {
    
     27, 28, 65, 25, 15, 34, 19, 49, 18, 37 };
	int n = sizeof(arr) / sizeof(int);
	printf("全部数据:");
	for (int i = 0; i < n; i++)
	{
    
    
		printf("%d ", arr[i]);
	}
	printf("\n");
	TopK(arr, 5, n);
	return 0;
}

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_50886514/article/details/114928789