TopK problem (the largest/smallest number of k)
First, briefly state the problem. For an integer array, find the smallest k
number. For example, if you enter the 8 numbers 4, 5, 1, 6, 2, 7, 3, and 8, the smallest 4 numbers are 1. , 2, 3, 4.
Leetcode link , this blog refers to the official answer of Leetcode .
The method for finding the maximum or minimum k numbers is the same. The following solutions are all for the minimum k numbers .
1. Sort
The simplest idea is to sort directly! After sorting, just select the smallest k number directly, and directly upload the code:
#include <vector>
#include <iostream>
#include <algorithm>
void getLeastNumbers(vector<int>& arr, int k) {
sort(arr.begin(), arr.end());
}
int main() {
ios::sync_with_stdio(false);
vector<int> arr = {
3,2,1,4,6,5};
int k = 3;
getLeastNumbers(arr, k);
for(int i; i<k; i++){
cout<<arr[i]<<" ";
}
return 0;
}
Output result:
1 2 3
Analysis of Algorithms
- Time complexity: O * (* nlogn), where n is the array
arr
length. The time complexity of the algorithm is the time complexity of sorting. - Space complexity: O(logn), the additional space complexity required for sorting is O(logn)
2. Heap
We use priority_queue in the STL container to implement real-time maintenance of the top k small values of a large root heap (if you need the top k large values, you need a small root heap to maintain the top k values of the array in real time, priority_queue defaults to the large root heap, add parameters Use small root pile)
Directly on the code:
void getLeastNumbers(vector<int>& arr, int k) {
priority_queue<int>Q;
for (int i = 0; i < k; ++i) Q.push(arr[i]);
for (int i = k; i < (int)arr.size(); ++i) {
if (arr[i] < Q.top()) {
Q.pop();
Q.push(arr[i]);
}
}
for (int i = 0; i < k; ++i) {
cout<<Q.top()<<" ";
Q.pop();
}
}
int main() {
ios::sync_with_stdio(false);
vector<int> arr = {
3,2,1,4,6,5};
int k = 3;
getLeastNumbers(arr, k);
return 0;
}
Output result:
3 2 1
Implement a small root heap, thereby maintaining the largest k values:
priority_queue<int, vector<int>, greater<int>>Q;
Analysis of Algorithms
-
Time complexity: O(nlogk), where n is the length of the array arr. Since the big root heap maintains the first k small values in real time, insertion and deletion are all O(logk) time complexity. In the worst case, n numbers in the array will be inserted, so a total of O(nlogk) time complexity is required.
-
Space complexity: O(k), because there are at most k numbers in the big root pile
3. Fast sorting thoughts
To learn from the idea of quick sort, each division of quick sort will divide the array into two parts, and we now need to find the smallest number of k is actually to divide the array into two parts, one of which is k and less than Split point .
We define randomized_selected(arr, l, r, k)
to divide the array, in [l,r]
order to divide the range of the array, in k
order to hope that the number of points is smaller than the number of points, we call the fast sort function to divide the [l,r]
part of the array, assuming that the coordinates of the obtained division position are pos
( pos
the value smaller than the position is on the left , The ones greater than are on the right), and then there will be the following situations:
- If
pos - l + 1 == k
, it indicatespivot
that small numbers of k, direct access to the left of the value of k is the smallest number k. - If
pos - l + 1 < k
, k represents a small number in the right side of the pivot, so the recursive callrandomized_selected(arr, pos + 1, r, k - (pos - l + 1))
to - If
pos - l + 1 > k
, k represents a small number on the left side of the pivot, recursive callsrandomized_selected(arr, l, pos - 1, k)
can be.
In this way, the split point can be finally found.
Here is the code part:
#include <vector>
#include <iostream>
#include <algorithm>
#include <time.h>
// 快排划分的过程,守卫放在最右侧
int partition(vector<int>& nums, int l, int r) {
int pivot = nums[r];
int i = l - 1;
for (int j = l; j <= r - 1; ++j) {
if (nums[j] <= pivot) {
i = i + 1;
swap(nums[i], nums[j]);
}
}
swap(nums[i + 1], nums[r]);
return i + 1;
}
// 基于随机的划分
int randomized_partition(vector<int>& nums, int l, int r) {
int i = rand() % (r - l + 1) + l; // 随机选取划分元素
swap(nums[r], nums[i]);
return partition(nums, l, r); // 返回划分的pos
}
void randomized_selected(vector<int>& arr, int l, int r, int k) {
if (l >= r) return;
int pos = randomized_partition(arr, l, r);
int num = pos - l + 1;
if (k == num) return; // 划分位置刚好为k,直接返回
else if (k < num) randomized_selected(arr, l, pos - 1, k); // 否则继续划分
else randomized_selected(arr, pos + 1, r, k - num);
}
void getLeastNumbers(vector<int>& arr, int k) {
srand((unsigned)time(NULL));
randomized_selected(arr, 0, (int)arr.size() - 1, k);
}
int main() {
ios::sync_with_stdio(false);
vector<int> arr = {
3,2,1,4,6,5};
int k = 3;
getLeastNumbers(arr, k);
for(int i = 0; i < k; i++){
cout<<arr[i]<<" ";
}
return 0;
}
Output result:
1 2 3
Analysis of Algorithms
- Time complexity: the expectation is O(n), the worst-case time complexity is O (n 2) O(n^2)O ( n2 ) Whenthe situation is the worst, each division point is the maximum or minimum, a total of n-1 divisions are required, and one division requires linear time complexity O(n), so the time is complicated in the worst case Degree isO (n 2) O(n^2)O ( n2)
- Space complexity: The expectation is O(logn), the expected depth of recursive calls is O(logn), the space required for each layer is O(1), and there are only constant variables.
4.bfprt algorithm
It is related to the improvement in the idea of fast sorting, because the time complexity of the thinking of fast sorting will reach O (n 2) O(n^2) in the worst case.O ( n2 ), the bfprt algorithm is based on some improvements made on this, more specifically, the guard selection during the fast queue (when the guard is the maximum/small value, the algorithm will degenerate toO (n 2) O(n^2) )O ( n2 )), select the first place by finding the median twice. For details, please refer to thisblog, which also contains practical codes~
5. Call library functions directly
Since you insist on seeing this, I must have learned the previous methods, hahaha, I didn't expect that there are ready-made library functions (the party said that it is very awkward, I didn't find this function QAQ before...)
There is a magical function in the powerful STL library nth_element
, which is used to find the k-th smallest integer. It is very convenient (but it doesn’t feel too exaggerated. It feels similar to the topK of the fast sort idea written by myself, but This convenient duck)
The following code briefly introduces how to use it, it can be said to be clear at a glance_
int a[n];
nth_element(a,a+k,a+n); // 将第k小的元素就位
cout<<a[k]<<endl;