TopK problem (the largest/smallest number of k)

TopK problem (the largest/smallest number of k)

First, briefly state the problem. For an integer array, find the smallest knumber. For example, if you enter the 8 numbers 4, 5, 1, 6, 2, 7, 3, and 8, the smallest 4 numbers are 1. , 2, 3, 4.

Leetcode link , this blog refers to the official answer of Leetcode .

The method for finding the maximum or minimum k numbers is the same. The following solutions are all for the minimum k numbers .

1. Sort

The simplest idea is to sort directly! After sorting, just select the smallest k number directly, and directly upload the code:

#include <vector>
#include <iostream>
#include <algorithm>
void getLeastNumbers(vector<int>& arr, int k) {
    
    
    sort(arr.begin(), arr.end());
}
int main() {
    
    
	ios::sync_with_stdio(false);
	vector<int> arr = {
    
    3,2,1,4,6,5};
	int k = 3;
	getLeastNumbers(arr, k);
	for(int i; i<k; i++){
    
    
		cout<<arr[i]<<" ";
	}
	return 0;
}

Output result:

1 2 3

Analysis of Algorithms

  • Time complexity: O * (* nlogn), where n is the array arrlength. The time complexity of the algorithm is the time complexity of sorting.
  • Space complexity: O(logn), the additional space complexity required for sorting is O(logn)

2. Heap

We use priority_queue in the STL container to implement real-time maintenance of the top k small values ​​of a large root heap (if you need the top k large values, you need a small root heap to maintain the top k values ​​of the array in real time, priority_queue defaults to the large root heap, add parameters Use small root pile)

Directly on the code:

void getLeastNumbers(vector<int>& arr, int k) {
    
    
    priority_queue<int>Q;
    for (int i = 0; i < k; ++i) Q.push(arr[i]);
    for (int i = k; i < (int)arr.size(); ++i) {
    
    
        if (arr[i] < Q.top()) {
    
    
            Q.pop();
            Q.push(arr[i]);
        }
    }
    for (int i = 0; i < k; ++i) {
    
    
    	cout<<Q.top()<<" ";
        Q.pop();
    }
}
int main() {
    
    
	ios::sync_with_stdio(false);
	vector<int> arr = {
    
    3,2,1,4,6,5};
	int k = 3;
	getLeastNumbers(arr, k);
	return 0;
}

Output result:

3 2 1

Implement a small root heap, thereby maintaining the largest k values:

priority_queue<int, vector<int>, greater<int>>Q;

Analysis of Algorithms

  • Time complexity: O(nlogk), where n is the length of the array arr. Since the big root heap maintains the first k small values ​​in real time, insertion and deletion are all O(logk) time complexity. In the worst case, n numbers in the array will be inserted, so a total of O(nlogk) time complexity is required.

  • Space complexity: O(k), because there are at most k numbers in the big root pile

3. Fast sorting thoughts

To learn from the idea of ​​quick sort, each division of quick sort will divide the array into two parts, and we now need to find the smallest number of k is actually to divide the array into two parts, one of which is k and less than Split point .

We define randomized_selected(arr, l, r, k)to divide the array, in [l,r]order to divide the range of the array, in korder to hope that the number of points is smaller than the number of points, we call the fast sort function to divide the [l,r]part of the array, assuming that the coordinates of the obtained division position are pos( posthe value smaller than the position is on the left , The ones greater than are on the right), and then there will be the following situations:

  1. If pos - l + 1 == k, it indicates pivotthat small numbers of k, direct access to the left of the value of k is the smallest number k.
  2. If pos - l + 1 < k, k represents a small number in the right side of the pivot, so the recursive call randomized_selected(arr, pos + 1, r, k - (pos - l + 1))to
  3. If pos - l + 1 > k, k represents a small number on the left side of the pivot, recursive calls randomized_selected(arr, l, pos - 1, k)can be.

In this way, the split point can be finally found.

Here is the code part:

#include <vector>
#include <iostream>
#include <algorithm>
#include <time.h>
// 快排划分的过程,守卫放在最右侧 
int partition(vector<int>& nums, int l, int r) {
    
    
    int pivot = nums[r];
    int i = l - 1;
    for (int j = l; j <= r - 1; ++j) {
    
    
        if (nums[j] <= pivot) {
    
    
            i = i + 1;
            swap(nums[i], nums[j]);
        }
    }
    swap(nums[i + 1], nums[r]);
    return i + 1;
}
// 基于随机的划分
int randomized_partition(vector<int>& nums, int l, int r) {
    
    
    int i = rand() % (r - l + 1) + l;	// 随机选取划分元素 
    swap(nums[r], nums[i]);
    return partition(nums, l, r);	// 返回划分的pos 
}
void randomized_selected(vector<int>& arr, int l, int r, int k) {
    
    
    if (l >= r) return;
    int pos = randomized_partition(arr, l, r);
    int num = pos - l + 1;
    if (k == num) return;	// 划分位置刚好为k,直接返回 
    else if (k < num) randomized_selected(arr, l, pos - 1, k);	// 否则继续划分 
    else randomized_selected(arr, pos + 1, r, k - num);   
}
void getLeastNumbers(vector<int>& arr, int k) {
    
    
    srand((unsigned)time(NULL));
    randomized_selected(arr, 0, (int)arr.size() - 1, k);
}
int main() {
    
    
	ios::sync_with_stdio(false);
	vector<int> arr = {
    
    3,2,1,4,6,5};
	int k = 3;
	getLeastNumbers(arr, k);
	for(int i = 0; i < k; i++){
    
    
		cout<<arr[i]<<" ";
	}
	return 0;
}

Output result:

1 2 3

Analysis of Algorithms

  • Time complexity: the expectation is O(n), the worst-case time complexity is O (n 2) O(n^2)O ( n2 ) Whenthe situation is the worst, each division point is the maximum or minimum, a total of n-1 divisions are required, and one division requires linear time complexity O(n), so the time is complicated in the worst case Degree isO (n 2) O(n^2)O ( n2)
  • Space complexity: The expectation is O(logn), the expected depth of recursive calls is O(logn), the space required for each layer is O(1), and there are only constant variables.

4.bfprt algorithm

It is related to the improvement in the idea of ​​fast sorting, because the time complexity of the thinking of fast sorting will reach O (n 2) O(n^2) in the worst case.O ( n2 ), the bfprt algorithm is based on some improvements made on this, more specifically, the guard selection during the fast queue (when the guard is the maximum/small value, the algorithm will degenerate toO (n 2) O(n^2) )O ( n2 )), select the first place by finding the median twice. For details, please refer to thisblog, which also contains practical codes~

5. Call library functions directly

Since you insist on seeing this, I must have learned the previous methods, hahaha, I didn't expect that there are ready-made library functions (the party said that it is very awkward, I didn't find this function QAQ before...)

There is a magical function in the powerful STL library nth_element, which is used to find the k-th smallest integer. It is very convenient (but it doesn’t feel too exaggerated. It feels similar to the topK of the fast sort idea written by myself, but This convenient duck)

The following code briefly introduces how to use it, it can be said to be clear at a glance_

int a[n];
nth_element(a,a+k,a+n);		// 将第k小的元素就位
cout<<a[k]<<endl;

Guess you like

Origin blog.csdn.net/weixin_44338712/article/details/108076737