Article directory
Detailed Explanation and Application of Quick Sort
What is quick sort?
1. Quick sort is obviously an array algorithm, what does it have to do with a binary tree?
The first thing we need to know is that all recursive algorithms, regardless of what they do, are essentially traversing a (recursive) tree, and then executing code on the node (the front, middle, and back order positions), you have to write recursion The algorithm is essentially to tell each node what to do.
Then look at the code frame of merge sort:
void sort(int[] nums, int low, int high) {
if (low >= high) {
return;
}
// 对 nums[low..high] 进行切分
// 使得 nums[low..p-1] <= nums[p] < nums[p+1..high]
int p = partition(nums, low, hi);
// 去左右子数组进行切分
sort(nums, low, p - 1);
sort(nums, p + 1, high);
}
Obviously, quick sorting is to first split the entire array, and then split the left and right sub-arrays respectively. That is to say, quick sort is to sort one element first and then sort the remaining elements.
The core of quick sorting is the partition function. The function of the partition function is to find a dividing point p in nums[low...high], and make nums[low...p-10] less than or equal to nums[p] by exchanging elements, and nums[ p+1...high] are greater than nums[p]
What does it mean that the elements on the left of an element are smaller than it, and the elements on the right are larger than it?
That is to say, after a round of partitioning, nums[p] will be placed in the correct position. (that is, nums[p] has been sorted)
Then the next step is to sort the remaining elements
What are the remaining elements? Obviously, the elements are distributed on the left and right, so we can recurse the subarray and use the partition function to sort the remaining elements
But here we can think of the preorder traversal of the binary tree
/* 二叉树遍历框架 */
void traverse(TreeNode root) {
if (root == null) {
return;
}
/****** 前序位置 ******/
print(root.val);
/*********************/
traverse(root.left);
traverse(root.right);
}
Therefore, it is concluded that the process of quick sorting can be abstracted into a binary tree, and the sub-array nums[lo..hi]
is understood as the value on the node of the binary tree, and sort
the function is understood as the traversal function of the binary tree .
Referring to the pre-order traversal order of the binary tree, the quick sort can be shown in the figure. The second array is originally empty, and after multiple partitions, it is filled into the array in order of color
[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-nUVlk44p-1680504623264)(D:\Development\Typora\img\image-20230403094217056.png)]
It can be noticed that the binary tree formed at last is a binary search tree.
Why is it a binary search tree in the end? Is it a coincidence?
It's not a coincidence, it's because partition
the function divides the array into left and right parts each time , which coincides with the characteristics of the binary search tree, which is small in the left and large in the right.
Therefore, we can also understand the process of quick sorting as a process of constructing a binary search tree
But when it comes to the construction of the binary search tree, we have to talk about the extreme case of the unbalanced binary search tree. In extreme cases, the binary search tree will degenerate into a linked list, resulting in a significant reduction in operating efficiency.
There is a similar situation in the process of quick sorting. For example, in the graph I drew, every time the dividing point selected by
partition
the function cannums[low..high]
divide into two halves, but in reality, you may not be so lucky.If you are particularly lucky every time, and there are very few elements on one side, this will cause the binary tree to grow unbalanced:
In this case, the time complexity will increase significantly
Solution:
In order to avoid this extreme situation, we need to introduce randomness .
A common way is
洗牌算法
to , orpartition
randomly select array elements in the function as the dividing point. This article will use the former.
2. Quick sort code implementation
class Quick {
public static void sort(int[] nums) {
// 为了避免出现耗时的极端情况,先随机打乱
shuffle(nums);
// 排序整个数组(原地修改)
sort(nums, 0, nums.length - 1);
}
private static void sort(int[] nums, int lo, int hi) {
if (lo >= hi) {
return;
}
// 对 nums[lo..hi] 进行切分
// 使得 nums[lo..p-1] <= nums[p] < nums[p+1..hi]
int p = partition(nums, lo, hi);
sort(nums, lo, p - 1);
sort(nums, p + 1, hi);
}
// 对 nums[lo..hi] 进行切分
private static int partition(int[] nums, int lo, int hi) {
int pivot = nums[lo];
// 关于区间的边界控制需格外小心,稍有不慎就会出错
// 我这里把 i, j 定义为开区间,同时定义:
// [lo, i) <= pivot;(j, hi] > pivot
// 之后都要正确维护这个边界区间的定义
int i = lo + 1, j = hi;
// 当 i > j 时结束循环,以保证区间 [lo, hi] 都被覆盖
while (i <= j) {
while (i < hi && nums[i] <= pivot) {
i++;
// 此 while 结束时恰好 nums[i] > pivot
}
while (j > lo && nums[j] > pivot) {
j--;
// 此 while 结束时恰好 nums[j] <= pivot
}
// 此时 [lo, i) <= pivot && (j, hi] > pivot
if (i >= j) {
break;
}
swap(nums, i, j);
}
// 将 pivot 放到合适的位置,即 pivot 左边元素较小,右边元素较大
swap(nums, lo, j);
return j;
}
// 洗牌算法,将输入的数组随机打乱
private static void shuffle(int[] nums) {
Random rand = new Random();
int n = nums.length;
for (int i = 0 ; i < n; i++) {
// 生成 [i, n - 1] 的随机数
int r = i + rand.nextInt(n - i);
swap(nums, i, r);
}
}
// 原地交换数组中的两个元素
private static void swap(int[] nums, int i, int j) {
int temp = nums[i];
nums[i] = nums[j];
nums[j] = temp;
}
}
Note: The implementation of the core function partitionn here is the same as the binary search. Finding the segmentation point correctly will test your control of the boundary conditions. A slight mistake will produce wrong results.
A trick to dealing with the details of the boundaries is that you need to specify the definition of each variable and the opening and closing of the interval
In fact, there are two situations
If high=length-1, the condition for jumping out of the loop is i>j, this is because when i=j jumps out, the pointed element is not covered
If high=length, the condition for jumping out of the loop is i=j, because i=j has covered all [low...high] elements, so i=j refers to the position of pivot
3. Complexity analysis
Similar to merge sort, it needs to be combined with sorted binary tree traversal process to analyze as a whole:
partition
The number of execution times is the number of binary tree nodes, and the complexity of each execution is the length nums[lo..hi]
of , so the total time complexity is the number of "array elements" in the entire tree .
3.1 Time Complexity
Assuming that the number of array elements is N
, then the sum of the number of elements in each layer of the binary tree is O(N)
; ideally, the number of layers of the tree is evenly distributed O(logN)
, so the ideal total time complexity is O(NlogN)
.
3.2 Space Complexity
Since quick sort does not use any auxiliary array, the space complexity is the depth of the recursive stack, which is the tree height O(logN)
.
3.3 Time Complexity of Special Cases
However, there is a certain randomness in the efficiency of quick sorting. If the results of each
partition
split extremely uneven:
Quick sorting degenerates into selection sorting, the tree height is
O(N)
, the number of elements in each layer nodeN
decreases , and the total time complexity is:N + (N - 1) + (N - 2) + … + 1 = O(N^2)
3.4 In summary
The ideal time complexity of quicksort is O(NlogN)
, the space complexity O(logN)
, and the worst time complexity in the extreme case is O(N^2)
, the space complexity is O(N)
.
However, it is difficult for the randomized
partition
function to appear in extreme cases, so the efficiency of quick sort is still very high.
3.5 Contrast Merge Sort
Quick sort is "unstable sort", in contrast, merge sort is "stable sort"
3.5.1 What is a stable sort? What is an unstable sort?
For the same elements in the sequence , if their relative positions do not change after sorting, the sorting algorithm is called "stable sorting", otherwise it is called "unstable sorting".
3.5.2 What are the advantages of stable sorting?
Stability means nothing if you sort the int array alone. But if sorting some data with a more complex structure, then stability sorting has a greater advantage.
For example, you have several order data, which have been sorted according to the order number, and now you want to sort the transaction date of the order:
If a stable sorting algorithm (such as merge sort) is used, then these orders are not only sorted according to the transaction date, but also the order numbers of the orders with the same transaction date are still in order .
But if you use an unstable sorting algorithm (such as quick sort), then although the sorting results will be sorted according to the transaction date, the order numbers of the orders with the same transaction date will lose their order .
3.5.3 Why focus on stability?
In actual projects, we often sort a certain field of a complex object
key
, so we should pay attention to what kind of sorting algorithm is used at the bottom of the API provided by the programming language, whether it is stable or unstable, which may affect the code Efficiency and even correctness of execution.
A variant of the quicksort algorithm: the quickselect algorithm
215. The Kth Largest Element in an Array
There are two approaches to this question, one is the binary heap (priority queue) solution, and the other is the fast selection algorithm
Compare the two solutions:
1. Binary heap:
The solution of the binary heap is relatively simple, but the time complexity is slightly higher
int findKthLargest(int[] nums, int k) {
// 小顶堆,堆顶是最小元素
PriorityQueue<Integer>
pq = new PriorityQueue<>();
for (int e : nums) {
// 每个元素都要过一遍二叉堆
pq.offer(e);
// 堆中元素多于 k 个时,删除堆顶元素
if (pq.size() > k) {
pq.poll();
}
}
// pq 中剩下的是 nums 中 k 个最大元素,
// 堆顶是最小的那个,即第 k 个最大元素
return pq.peek();
}
Note: Binary heap (priority queue) is a data structure that can be automatically sorted
The specific analysis is:
pq
Understand the small top heap as a sieve, larger elements will settle down, and smaller elements will float up; when the size of the heap k
exceeds , we delete the elements at the top of the heap, because these elements are relatively small, and we want to What you want is the k
first largest element. (When all the elements have been passed through, the deleted elements are nk, so the remaining k elements in the small top pile are the kth largest)
Time complexity analysis:
The time complexity of binary heap insertion and deletion is related to the number of elements in the heap. Here, the size of our heap will not exceed, so the complexity
k
of inserting and deleting elements isO(logk)
, and then set a layer of for loop, assuming the total number of array elements ForN
, the total time complexity isO(Nlogk)
Space Complexity Analysis:
The space complexity of this solution is obviously the size of the binary heap, which is
O(k)
2. Quick Selection
The quick selection algorithm is a variant of quick sort, which is more efficient . If you can write a quick selection algorithm in the interview, it will definitely be a bonus item.
Specific analysis to achieve:
The question asks that the k
largest element is equivalent to finding n-k
the element ranked in ascending order of the array
So how to direct the ranked n-k
element?
In fact, in the process of executing the quick sort partition
algorithm function, you can see a little bit
Just now when we talked about the quick sort algorithm, we have already said that
partition
the function willnums[p]
arrange in the correct position, so thatnums[low..p-1] < nums[p] < nums[p+1..high]
At this time, although the entire array has not been sorted, we have made the elements
nums[p]
on the left ofnums[p]
to smaller than , and we know the rankingnums[p]
of .So we can compare
p
withk'
, ifp < k'
means that thek'
largest element isnums[p+1..hi]
in , ifp > k'
means that thek'
largest element isnums[lo..p-1]
in .Then go further, go
nums[p+1..hi]
ornums[lo..p-1]
executepartition
the function , you can further narrow downk'
the range of the element ranked at , and finally find the target element.
public int findKthLargest(int[] nums, int k) {
//首先打乱数组的顺序
shuffle(nums);
int newK=nums.length-k;
int low=0,high=nums.length-1;
while(low<=high){
int p=partition(nums,low,high);
if(newK<p){
high=p-1;
}else if(newK>p){
low=p+1;
}else{
return nums[p];
}
}
return -1;
}
public int partition(int[] nums,int low,int high){
int pivot=nums[low];
int i=low+1;
int j=high;
while(i<=j){
while(i<high&&nums[i]<=pivot){
i++;
}
while(j>low&&nums[j]>pivot){
j--;
}
if(i>=j){
break;
}
swap(nums,i,j);
}
swap(nums,low,j);
return j;
}
public void shuffle(int[] nums){
Random random=new Random();
int n=nums.length;
for(int i=0;i<n;i++){
int r=i+random.nextInt(n-i);
swap(nums,i,r);
}
}
public void swap(int[] nums,int a,int b){
int temp=nums[a];
nums[a]=nums[b];
nums[b]=temp;
}
This code framework is actually very similar to the binary search code we mentioned earlier, which is why this algorithm is efficient
Time Complexity Analysis
Obviously, the time complexity of this algorithm is also mainly concentrated on partition
the function , we need to estimate partition
how many times the function is executed, and what is the time complexity of each execution
In the best case, every time
partition
the function splitsp
exactly the middle index(lo + hi) / 2
(divided), and after each split, it will go to the left or right sub-array to continue splitting, then the number of timespartition
the function executed is logN, each time The size of the input array is halved.So the total time complexity is:
Arithmetic sequence
N + N/2 + N/4 + N/8 + ... + 1 = 2N = O(N)
Similar to quick sorting,
partition
the function may also have extreme cases, in the worst casep
alwayslo + 1
or alwayshi - 1
, so that the time complexity degeneratesO(N^2)
toSo the total time complexity is:
N + (N - 1) + (N - 2) + … + 1 = O(N^2)
In order to avoid the worst case, we use shuffle in the code to avoid the emergence of extreme cases by introducing randomness, so as to keep the efficiency of the algorithm at a relatively high level. The complexity of the fast selection algorithm after randomization can be considered as O(N).