Article directory

Detailed Explanation and Application of Quick Sort

Detailed Explanation and Application of Quick Sort

What is quick sort?

1. Quick sort is obviously an array algorithm, what does it have to do with a binary tree?

The first thing we need to know is that all recursive algorithms, regardless of what they do, are essentially traversing a (recursive) tree, and then executing code on the node (the front, middle, and back order positions), you have to write recursion The algorithm is essentially to tell each node what to do.

Then look at the code frame of merge sort:

void sort(int[] nums, int low, int high) {
    
    
    if (low >= high) {
    
    
        return;
    }
    // 对 nums[low..high] 进行切分
    // 使得 nums[low..p-1] <= nums[p] < nums[p+1..high]
    int p = partition(nums, low, hi);
    // 去左右子数组进行切分
    sort(nums, low, p - 1);
    sort(nums, p + 1, high);
}

Obviously, quick sorting is to first split the entire array, and then split the left and right sub-arrays respectively. That is to say, quick sort is to sort one element first and then sort the remaining elements.

The core of quick sorting is the partition function. The function of the partition function is to find a dividing point p in nums[low...high], and make nums[low...p-10] less than or equal to nums[p] by exchanging elements, and nums[ p+1...high] are greater than nums[p]

What does it mean that the elements on the left of an element are smaller than it, and the elements on the right are larger than it?

That is to say, after a round of partitioning, nums[p] will be placed in the correct position. (that is, nums[p] has been sorted)

Then the next step is to sort the remaining elements

What are the remaining elements? Obviously, the elements are distributed on the left and right, so we can recurse the subarray and use the partition function to sort the remaining elements

But here we can think of the preorder traversal of the binary tree

/* 二叉树遍历框架 */
void traverse(TreeNode root) {
    
    
    if (root == null) {
    
    
        return;
    }
    /****** 前序位置 ******/
    print(root.val);
    /*********************/
    traverse(root.left);
    traverse(root.right);
}

Therefore, it is concluded that the process of quick sorting can be abstracted into a binary tree, and the sub-array nums[lo..hi]is understood as the value on the node of the binary tree, and sortthe function is understood as the traversal function of the binary tree .

Referring to the pre-order traversal order of the binary tree, the quick sort can be shown in the figure. The second array is originally empty, and after multiple partitions, it is filled into the array in order of color

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-nUVlk44p-1680504623264)(D:\Development\Typora\img\image-20230403094217056.png)]

It can be noticed that the binary tree formed at last is a binary search tree.

Why is it a binary search tree in the end? Is it a coincidence?

It's not a coincidence, it's because partitionthe function divides the array into left and right parts each time , which coincides with the characteristics of the binary search tree, which is small in the left and large in the right.

Therefore, we can also understand the process of quick sorting as a process of constructing a binary search tree

But when it comes to the construction of the binary search tree, we have to talk about the extreme case of the unbalanced binary search tree. In extreme cases, the binary search tree will degenerate into a linked list, resulting in a significant reduction in operating efficiency.

There is a similar situation in the process of quick sorting. For example, in the graph I drew, every time the dividing point selected by partitionthe function can nums[low..high]divide into two halves, but in reality, you may not be so lucky.

If you are particularly lucky every time, and there are very few elements on one side, this will cause the binary tree to grow unbalanced:

In this case, the time complexity will increase significantly

Solution:

In order to avoid this extreme situation, we need to introduce randomness .

A common way is 洗牌算法to , or partitionrandomly select array elements in the function as the dividing point. This article will use the former.

2. Quick sort code implementation

class Quick {
    
    

    public static void sort(int[] nums) {
    
    
        // 为了避免出现耗时的极端情况，先随机打乱
        shuffle(nums);
        // 排序整个数组（原地修改）
        sort(nums, 0, nums.length - 1);
    }

    private static void sort(int[] nums, int lo, int hi) {
    
    
        if (lo >= hi) {
    
    
            return;
        }
        // 对 nums[lo..hi] 进行切分
        // 使得 nums[lo..p-1] <= nums[p] < nums[p+1..hi]
        int p = partition(nums, lo, hi);

        sort(nums, lo, p - 1);
        sort(nums, p + 1, hi);
    }

    // 对 nums[lo..hi] 进行切分
    private static int partition(int[] nums, int lo, int hi)     {
    
    
        int pivot = nums[lo];
        // 关于区间的边界控制需格外小心，稍有不慎就会出错
        // 我这里把 i, j 定义为开区间，同时定义：
        // [lo, i) <= pivot；(j, hi] > pivot
        // 之后都要正确维护这个边界区间的定义
        int i = lo + 1, j = hi;
        // 当 i > j 时结束循环，以保证区间 [lo, hi] 都被覆盖
        while (i <= j) {
    
    
            while (i < hi && nums[i] <= pivot) {
    
    
                i++;
                // 此 while 结束时恰好 nums[i] > pivot
            }
            while (j > lo && nums[j] > pivot) {
    
    
                j--;
                // 此 while 结束时恰好 nums[j] <= pivot
            }
            // 此时 [lo, i) <= pivot && (j, hi] > pivot

            if (i >= j) {
    
    
                break;
            }
            swap(nums, i, j);
        }
        // 将 pivot 放到合适的位置，即 pivot 左边元素较小，右边元素较大
        swap(nums, lo, j);
        return j;
    }

    // 洗牌算法，将输入的数组随机打乱
    private static void shuffle(int[] nums) {
    
    
        Random rand = new Random();
        int n = nums.length;
        for (int i = 0 ; i < n; i++) {
    
    
            // 生成 [i, n - 1] 的随机数
            int r = i + rand.nextInt(n - i);
            swap(nums, i, r);
        }
    }

    // 原地交换数组中的两个元素
    private static void swap(int[] nums, int i, int j) {
    
    
        int temp = nums[i];
        nums[i] = nums[j];
        nums[j] = temp;
    }
}

Note: The implementation of the core function partitionn here is the same as the binary search. Finding the segmentation point correctly will test your control of the boundary conditions. A slight mistake will produce wrong results.

A trick to dealing with the details of the boundaries is that you need to specify the definition of each variable and the opening and closing of the interval

In fact, there are two situations

If high=length-1, the condition for jumping out of the loop is i>j, this is because when i=j jumps out, the pointed element is not covered

If high=length, the condition for jumping out of the loop is i=j, because i=j has covered all [low...high] elements, so i=j refers to the position of pivot

3. Complexity analysis

Similar to merge sort, it needs to be combined with sorted binary tree traversal process to analyze as a whole:

partitionThe number of execution times is the number of binary tree nodes, and the complexity of each execution is the length nums[lo..hi]of , so the total time complexity is the number of "array elements" in the entire tree .

3.1 Time Complexity

Assuming that the number of array elements is N, then the sum of the number of elements in each layer of the binary tree is O(N); ideally, the number of layers of the tree is evenly distributed O(logN), so the ideal total time complexity is O(NlogN).

3.2 Space Complexity

Since quick sort does not use any auxiliary array, the space complexity is the depth of the recursive stack, which is the tree height O(logN).

3.3 Time Complexity of Special Cases

However, there is a certain randomness in the efficiency of quick sorting. If the results of each partitionsplit extremely uneven:

Quick sorting degenerates into selection sorting, the tree height is O(N), the number of elements in each layer node Ndecreases , and the total time complexity is:

N + (N - 1) + (N - 2) + … + 1 = O(N^2)

3.4 In summary

The ideal time complexity of quicksort is O(NlogN), the space complexity O(logN), and the worst time complexity in the extreme case is O(N^2), the space complexity is O(N).

However, it is difficult for the randomized partitionfunction to appear in extreme cases, so the efficiency of quick sort is still very high.

3.5 Contrast Merge Sort

Quick sort is "unstable sort", in contrast, merge sort is "stable sort"

3.5.1 What is a stable sort? What is an unstable sort?

For the same elements in the sequence , if their relative positions do not change after sorting, the sorting algorithm is called "stable sorting", otherwise it is called "unstable sorting".

3.5.2 What are the advantages of stable sorting?

Stability means nothing if you sort the int array alone. But if sorting some data with a more complex structure, then stability sorting has a greater advantage.

For example, you have several order data, which have been sorted according to the order number, and now you want to sort the transaction date of the order:

If a stable sorting algorithm (such as merge sort) is used, then these orders are not only sorted according to the transaction date, but also the order numbers of the orders with the same transaction date are still in order .

But if you use an unstable sorting algorithm (such as quick sort), then although the sorting results will be sorted according to the transaction date, the order numbers of the orders with the same transaction date will lose their order .

3.5.3 Why focus on stability?

In actual projects, we often sort a certain field of a complex object key, so we should pay attention to what kind of sorting algorithm is used at the bottom of the API provided by the programming language, whether it is stable or unstable, which may affect the code Efficiency and even correctness of execution.

A variant of the quicksort algorithm: the quickselect algorithm

215. The Kth Largest Element in an Array

There are two approaches to this question, one is the binary heap (priority queue) solution, and the other is the fast selection algorithm

Compare the two solutions:

1. Binary heap:

The solution of the binary heap is relatively simple, but the time complexity is slightly higher

int findKthLargest(int[] nums, int k) {
    
    
    // 小顶堆，堆顶是最小元素
    PriorityQueue<Integer> 
        pq = new PriorityQueue<>();
    for (int e : nums) {
    
    
        // 每个元素都要过一遍二叉堆
        pq.offer(e);
        // 堆中元素多于 k 个时，删除堆顶元素
        if (pq.size() > k) {
    
    
            pq.poll();
        }
    }
    // pq 中剩下的是 nums 中 k 个最大元素，
    // 堆顶是最小的那个，即第 k 个最大元素
    return pq.peek();
}

Note: Binary heap (priority queue) is a data structure that can be automatically sorted

The specific analysis is:

pqUnderstand the small top heap as a sieve, larger elements will settle down, and smaller elements will float up; when the size of the heap kexceeds , we delete the elements at the top of the heap, because these elements are relatively small, and we want to What you want is the kfirst largest element. (When all the elements have been passed through, the deleted elements are nk, so the remaining k elements in the small top pile are the kth largest)

Time complexity analysis:

The time complexity of binary heap insertion and deletion is related to the number of elements in the heap. Here, the size of our heap will not exceed, so the complexity kof inserting and deleting elements is O(logk), and then set a layer of for loop, assuming the total number of array elements For N, the total time complexity isO(Nlogk)

Space Complexity Analysis:

The space complexity of this solution is obviously the size of the binary heap, which isO(k)

2. Quick Selection

The quick selection algorithm is a variant of quick sort, which is more efficient . If you can write a quick selection algorithm in the interview, it will definitely be a bonus item.

Specific analysis to achieve:

The question asks that the klargest element is equivalent to finding n-kthe element ranked in ascending order of the array

So how to direct the ranked n-kelement?

In fact, in the process of executing the quick sort partitionalgorithm function, you can see a little bit

Just now when we talked about the quick sort algorithm, we have already said that partitionthe function will nums[p]arrange in the correct position, so thatnums[low..p-1] < nums[p] < nums[p+1..high]

At this time, although the entire array has not been sorted, we have made the elements nums[p]on the left of nums[p]to smaller than , and we know the ranking nums[p]of .

So we can compare pwith k', if p < k'means that the k'largest element is nums[p+1..hi]in , if p > k'means that the k'largest element is nums[lo..p-1]in .

Then go further, go nums[p+1..hi]or nums[lo..p-1]execute partitionthe function , you can further narrow down k'the range of the element ranked at , and finally find the target element.

public int findKthLargest(int[] nums, int k) {
    
    
        //首先打乱数组的顺序
        shuffle(nums);
        int newK=nums.length-k;
        int low=0,high=nums.length-1;
        while(low<=high){
    
    
            int p=partition(nums,low,high);
            if(newK<p){
    
    
                high=p-1;
            }else if(newK>p){
    
    
                low=p+1;
            }else{
    
    
                return nums[p];
            }
        }
        return -1;
    }
    public int partition(int[] nums,int low,int high){
    
    
        int pivot=nums[low];
        int i=low+1;
        int j=high;
        while(i<=j){
    
    
            while(i<high&&nums[i]<=pivot){
    
    
                i++;
            }
            while(j>low&&nums[j]>pivot){
    
    
                j--;
            }
            if(i>=j){
    
    
                break;
            }
            swap(nums,i,j);
        }
        swap(nums,low,j);
        return j;
    }

    public void shuffle(int[] nums){
    
    
        Random random=new Random();
        int n=nums.length;
        for(int i=0;i<n;i++){
    
    
            int r=i+random.nextInt(n-i);
            swap(nums,i,r);
        }
        
    }
    public void swap(int[] nums,int a,int b){
    
    
        int temp=nums[a];
        nums[a]=nums[b];
        nums[b]=temp;
    }

This code framework is actually very similar to the binary search code we mentioned earlier, which is why this algorithm is efficient

Time Complexity Analysis

Obviously, the time complexity of this algorithm is also mainly concentrated on partitionthe function , we need to estimate partitionhow many times the function is executed, and what is the time complexity of each execution

In the best case, every time partitionthe function splits pexactly the middle index (lo + hi) / 2(divided), and after each split, it will go to the left or right sub-array to continue splitting, then the number of times partitionthe function executed is logN, each time The size of the input array is halved.

So the total time complexity is:

Arithmetic sequence
N + N/2 + N/4 + N/8 + ... + 1 = 2N = O(N)

Similar to quick sorting, partitionthe function may also have extreme cases, in the worst case palways lo + 1or always hi - 1, so that the time complexity degenerates O(N^2)to

So the total time complexity is:

N + (N - 1) + (N - 2) + … + 1 = O(N^2)

In order to avoid the worst case, we use shuffle in the code to avoid the emergence of extreme cases by introducing randomness, so as to keep the efficiency of the algorithm at a relatively high level. The complexity of the fast selection algorithm after randomization can be considered as O(N).