Sorting Algorithms - Quick Sort (Lomuto and Hoare Partitioning)

TIP: The author considers synchronizing more articles to javgo.cn , the current content is less, and will be updated in the future.

1. Quick sort concept

Quicksort is an efficient general-purpose sorting algorithm invented by British computer scientist Tony Hoare in 1960 and published in 1961. Although many new sorting algorithms have been invented, quick sort is still one of the commonly used sorting algorithms in modern computing. In practice, for random data, it is usually faster than merge sort and heap sort, especially when the amount of data is large.

Quick sort uses a "divide and conquer" strategy. Its basic idea is to select a "pivot" element from the array to be sorted, and then divide them into two sub-arrays according to the size relationship between other elements and the pivot element. After each partition operation, the pivot element is placed in its final position. Next, the algorithm recursively sorts the two subarrays until the length of the subarray is less than or equal to 1. Because of this partitioning and exchanging nature, quicksort is sometimes called partition-exchange sort.

Quick sort is a comparison sort because its core operation is to compare array elements with selected reference elements. If an element is less than the pivot element, it will be placed in the left subarray; if it is greater than or equal to the pivot element, it will be placed in the right subarray. It is important to note that due to the way this compare-and-swap is done, quicksort can be unstable. This means that equal elements in the original data may change their relative order after sorting.

From a performance perspective, the quicksort algorithm takes O(n log n) comparisons on average to sort n items, and in the worst case it does O(n^2) comparisons.
Sorting_quicksort_anim.gif
Quick sort algorithm (horizontal lines are benchmark values)

2. Algorithm implementation

As we said above, quicksort is a sorting method based on the "divide and conquer" strategy. It sorts the array through a step called "partitioning". There are several variants of quicksort because partitioning may be implemented differently, but they all follow the same basic principles.

In simple terms, when we sort an array with at least two elements, quicksort divides it into two subarrays, making sure that each element in the first subarray is not greater than the element in the second subarray any element. It then recursively does the same for the two subarrays.

The following are the basic steps of quicksort:

  1. Check the length of the array : if there are less than two elements in the array, then return directly without sorting. For very short arrays, we might use other sorting methods.
  2. Select Pivot Value : Select a value from the array as the pivot (piovt). The method chosen may vary from implementation to implementation, and may even be random at times.
  3. Partitioning : rearranges the array according to the pivot value, ensuring that all elements smaller than the pivot are to its left and elements greater than the pivot are to its right. Elements equal to the pivot can be placed on either side.
  4. Confirm the base position : Most partitioning methods will ensure that the base value ends up in the correct sort position after a round of partitioning.
  5. Recursive Sort : Recursively applies quicksort to the subarrays to the left and right of the pivot.

It is worth noting that how to choose the benchmark value and the specific partition method may affect the efficiency of quicksort to varying degrees. Therefore, when we talk about the performance of quicksort, we need to be clear about the specific methods and strategies used.

The following is a complete diagram of quicksorting an array in random order. The dark element in the figure is the pivot of selection, where it is always selected as the last element in the partition.
image.png
Note ⚠️:
In the quick sort algorithm, if we select the last element of the partition as the basis for sorting every time, then in some specific cases (such as the array is already sorted or all elements in the array are the same) this selection strategy It will lead to a decrease in the performance of the algorithm, and the specific time complexity is O(n^2). This is because in these specific cases each partition will be very uneven, leading to increased recursion depth and thus overall computation time.

It's easy to understand with an example, assuming we have a sorted array:

[1,2,3,4,5,6,7,8,9,10]

If we always choose the last element as the pivot for partitioning, then for the first partition, we choose 10 as the pivot. This causes the array to be split into two parts:

左子数组:[1,2,3,4,5,6,7,8,9] 
右子树组:[10] (位置确定)

As can be seen, this partitioning is very uneven. The left part contains 9 elements, while the right part has only 1 element. Next, we continue to perform quick sort on the left part. Selecting the last element 9 again as the basis, the array is divided into:

左子数组:[1,2,3,4,5,6,7,8] 
右子树组:[9] (位置确定)

This pattern continues, selecting the last element each time as the pivot, resulting in very uneven partitioning each time. Since only one element's final position (the pivot element) can be determined at a time, this means that we need to make n recursive calls to sort the entire array, where n is the length of the array. That's why the recursion depth increases, thus increasing the total computation time.
In the actual sorting process, especially when dealing with large data sets, it is common to encounter already sorted subarrays or subarrays composed of the same elements. If in this case, we choose the middle element as the pivot for sorting, then this selection strategy will have better performance than the strategy of choosing the last element as the pivot. This is because choosing the middle element as the pivot is more likely to make the partition even, thus reducing the recursion depth and overall computation time.

Below we describe two specific partitioning methods, the "Lomuto Partitioning Scheme" and the "Hoare Partitioning Scheme".

2.1 Lomuto partition scheme

The Lomuto partition scheme is an implementation in quicksort that chooses the last element of the array as the pivot. In this scheme, we use two indices iand jto iterate through the array. iElements before index are all less than the pivot, and elements at indices ithrough jare all greater than or equal to the pivot. This scheme is relatively concise and easy to understand, so it is often used in textbooks for beginners.

TIP: Please remember the note mentioned above, that is, when the array is already sorted or all elements are equal, the performance of this scheme will decrease, and the time complexity may reach O(n^2).

Here is the Java implementation of the scheme:

/**
 * 快速排序(Lomuto 分区方案)
 */
public class QuickSortLomuto {
    
    

    /**
     * 分区函数,确定基准点的最终位置(索引)
     * @param arr 待分区数组
     * @param low 当前分区下届
     * @param high 当前分区上届
     * @return 基准点的位置(索引)
     */
    private static int partition(int[] arr,int low,int high){
    
    
        // Lomuto 分区方案:选择最右边的元素作为基准点
        int pivot = arr[high];

        // 临时的基准索引 i:用于指向当前小于基准点的元素的位置(循环中,第一步是先自增,然后再使用,所以初值要减 1,从而使得第一次循环时 i 指向的是数组的第一个元素,也就是 j 指向的元素)
        int i = low - 1;
        
        // 小于基准点元素的索引 j:用于遍历数组,找出小于基准点的元素
        for (int j = low; j < high; j++) {
    
    
            // 在 j 指针向右移动的过程中,如果遇到小于基准点的元素
            if(arr[j] <= pivot){
    
    
                // 由于临时的基准索引 i 初始值为 low - 1,所以需要先自增,使其指向第一个元素(也就是 j 指向的元素)再进行交换
                i++;
                // 交换 i 指针(现在指向的是大于基准点的元素)和 j 指针(现在指向的是小于基准点的元素)指向的元素
                // 交换后,i 指针指向的是小于基准点的元素,j 指针指向的是大于基准点的元素
                swap(arr,i,j);
            }
        }

        // 将临时基准元素索引 i 移动到正确的位置(在较小和较大元素的中间)
        i++;
        // 交换 i 指针(现在指向的是大于基准点的元素)和 high 指针(现在指向的是基准点)指向的元素
        swap(arr,i,high);

        // 返回基准点最终的位置(索引)
        return i;
    }

    /**
     * 交换数组中的两个元素
     * @param arr 数组
     * @param i 临时的基准索引
     * @param j 遍历找到的小于基准点的元素的索引位置
     */
    private static void swap(int[] arr,int i,int j){
    
    
        int temp = arr[i];
        arr[i] = arr[j];
        arr[j] = temp;
    }

    /**
     * 快速排序
     * @param arr 待排序数组
     * @param low 数组下届
     * @param high 数组上届
     */
    public static void quickSort(int[] arr,int low,int high){
    
    
        // 递归终止条件(元素个数小于等于 1 个时,无需排序)
        if(low >= high){
    
    
            return;
        }

        // 分区,确定基准点的位置
        int pivot = partition(arr,low,high);

        // 递归调用,对左子数组进行快速排序
        quickSort(arr,low,pivot - 1);

        // 递归调用,对右子数组进行快速排序
        quickSort(arr,pivot + 1,high);
    }

    public static void main(String[] args) {
    
    
        int[] arr = {
    
    3,7,8,5,2,1,9,5,4};
        quickSort(arr,0,arr.length - 1);
        System.out.println(Arrays.toString(arr));
    }
}

In order to better understand the sorting process of the above code, let's understand it again in combination with the following figure:
Quick sort.png

2.2 Hoare partition scheme

The original quicksort partitioning scheme proposed by Tony Hoare is a neat way to partition an array using two pointers.

  1. How to move : The two pointers start from the two ends of the array and gradually move closer to each other.
  2. What is their purpose: Their task is to find a reversed pair where the element on the left is greater than the pivot value and the element on the right is less than the pivot value.
  3. How to deal with it : When such a reversed pair is found, if the left pointer is still to the left of the right pointer, the two elements are swapped because they are out of order relative to each other.
  4. Subsequent process : The two pointers then continue to move towards the middle, repeating the process until they cross.
  5. How to end : When the pointer crosses, the partition is completed, and the intersection point is the boundary of the partition.

Below is an animated demonstration of quicksort using the Hoare partitioning scheme, the red outlines show the positions of the left and right pointers ( and respectively i) j, the black outlines show the positions of the sorted elements, and the filled black squares show the value (baseline).


However, there is a problem to be aware of. Sometimes this method of partitioning can result in a subrange containing the entire original range, which means the algorithm doesn't actually partition. To solve this problem, Hoare proposes a fix: after partitioning is complete, we can reduce the size of the subrange containing the pivot element by removing the pivot element. If necessary, we can also swap the pivot element with the element near the split point to ensure that quicksort terminates gracefully.

Let's say we have the following array, which we want to sort quickly:

A = [8(i), 7, 6, 5(pivot), 4, 3, 2, 1(j)]

Suppose we choose the middle element 5 as the pivot. Now, we use Hoare's method, starting from both ends, to find reversed pairs.

  • Starting from the left, we find that the first element greater than 5 is 8
  • Starting from the right, we find the first element less than 5 is 1

We swap these two elements to get:

A = [1(i), 7, 6, 5(pivot), 4, 3, 2, 8(j)]

Continuing this process, the left pointer will move to 7 and the right pointer will move to 2, satisfying the reverse pair condition, and then swap them:

A = [1, 2(i), 6, 5(pivot), 4, 3, 7(j), 8]

Continuing again, the left pointer will move to 6 and the right pointer will move to 3, satisfying the reverse pair condition, and then swap them:

A = [1, 2, 3(i), 5(pivot), 4, 6(j), 7, 8]

Continuing again, the left pointer will move to 5 and the right pointer will move to 4, satisfying the reverse pair condition, and then swap them:

A = [1, 2, 3, 4(i), 5(j), 6, 7, 8]

Finally, the left pointer continues to move to the right, and the left and right pointers finally intersect at position 5 (find the boundary of the partition):

A = [1, 2, 3, 4, 5(i,j), 6, 7, 8]

However, you will notice that all elements are less than 5 except 5 and 6, 7, 8. This means that if we continue this algorithm (again starting over the entire array from the left and right ends), the left subrange [1, 2, 3, 4]will contain the entire original range and will not be swapped at all (except 6, 7, 8), which is not actually There is no effective partitioning.

To solve this problem, Hoare proposes a fix: after the partition is complete, we can remove the pivot element 5 so that the left subrange no longer includes it. This means that on the next recursive call, we only need to sort the subrange [1, 2, 3, 4] instead of the entire array.

In addition, although Hoare's original description is very intuitive, in the actual implementation, developers usually make some small adjustments to improve efficiency. For example, to simplify the implementation, we can consider also including elements equal to the pivot in the detection of reversed pairs. This allows us to use "greater than or equal to" and "less than or equal to" conditions instead of simply "greater than" and "less than". This adjustment, while seemingly trivial, actually simplifies the code and ensures that the pointer does not go out of bounds of the array.
Finally, to ensure that partitioning is always valid, we need to ensure that the selected pivot element is not the last element in the range. This is because, if the pivot is the last element, and all other elements are smaller than it, then the partition will not advance. To avoid this, we can adjust by selecting the element in the middle of the range as a pivot.

Here is the Java implementation of the scheme:

/**
 * 快速排序(Hoare 分区方案)
 */
public class QuickSortHoare {
    
    

    /**
     * 分区函数,确定基准点的最终位置(索引)
     * @param arr 待分区数组
     * @param low 当前分区下届
     * @param high 当前分区上届
     * @return 基准点的位置(索引)
     */
    private static int partition(int[] arr,int low,int high){
    
    

        // Hoare 分区方案:选择中间的元素作为基准点
        int pivot = arr[low + (high - low) / 2];

        // 初始化左右指针
        int i = low - 1;    // 左指针:寻找大于基准点的元素
        int j = high + 1;   // 右指针:寻找小于基准点的元素

        // 无限循环,直到左右指针相遇(找到分区点)
        while (true){
    
    
            // 将左指针向右移动,直到找到大于或等于基准点的元素
            do{
    
    
                i++;
            }while (arr[i] < pivot);

            // 将右指针向左移动,直到找到小于或等于基准点的元素
            do {
    
    
                j--;
            }while (arr[j] > pivot);

            // 如果左右指针相遇,说明找到了分区点,退出循环
            if(i >= j){
    
    
                // 分区点选择 j 的原因
                return j;
            }

            // 找到符合逆序对条件的元素,交换两个元素的位置
            int temp = arr[i];
            arr[i] = arr[j];
            arr[j] = temp;
        }
    }

    /**
     * 快速排序
     * @param arr 待排序数组
     * @param low 当前分区下届
     * @param high 当前分区上届
     */
    public static void quickSort(int[] arr,int low,int high){
    
    
        // 确保索引在有效范围内
        if (low >= 0 && high < arr.length && low < high){
    
    
            // 分区操作,将数组分为两个分区,返回分区点的位置(索引)
            int pivot = partition(arr,low,high);

            // 对左分区进行快速排序
            quickSort(arr,low,pivot);

            // 对右分区进行快速排序
            quickSort(arr,pivot + 1,high);
        }
    }

    public static void main(String[] args) {
    
    
        int[] arr = {
    
    6,5,3,1,8,7,2,4};
        quickSort(arr,0,arr.length - 1);
        System.out.println(Arrays.toString(arr));
    }
}

For the above code implementation, there are two points of attention to be explained separately.

  1. Why use low + (high - low) / 2 instead of (low + high) / 2 to calculate the intermediate value?

This is to prevent integer overflow. When you have two very large integers (close Integer.MAX_VALUE), their sum may exceed the maximum value of the integers, causing an overflow. This overflow may cause the calculated intermediate value to be a negative or otherwise incorrect value.

For example, assuming low = Integer.MAX_VALUE - 1and hi = Integer.MAX_VALUE, then (lo + hi)will overflow. However, using low + (high - lo) / 2will not have this problem, because high - lowthe result of is a small positive number, and adding after dividing by 2 lowis still a legal integer.

So, to ensure the robustness of the code, especially when dealing with large data sets, we usually use low + (high - low) / 2 = low + high/2 - low/2 = low/2 + high/2 = (low + hign)/2to calculate intermediate values.

  1. Why is return j chosen as the partition point instead of return i when the pointers meet?

In a Hall partitioning scheme, our goal is to find a position jsuch that all elements to the left of this position are less than or equal to the pivot, and all elements to the right are greater than or equal to the pivot.

In the loop, ithe pointer moves from left to right until it finds an element greater than or equal to the pivot, and jthe pointer moves from right to left until it finds an element less than or equal to the pivot. When ithe and jpointers intersect, we know that jto the left of the pointer are elements less than or equal to the datum, and ito the right of the pointer are elements greater than or equal to the datum.

Therefore, when the pointer crosses, jit is the position of the last element that is less than or equal to the pivot, which is exactly the partition point we want. Rather, iis the position of the first element greater than or equal to the pivot, so it is not a suitable partition point.

Selecting jas the partition point can ensure that all elements of the left sub-array are less than or equal to the benchmark, and all elements of the right sub-array are greater than or equal to the benchmark, which is the core idea of ​​the quick sort algorithm.


In order to better understand the sorting process of the above code, I set the array to be sorted to be consistent with the following animation, which can be understood again in conjunction with the following figure:
Quicksort-example.gif

TIP: The red outlines show the positions of the left and right pointers (i and j, respectively), the black outlines show the positions of the sorted elements, and the filled black squares show the value being compared to (baseline).

The array to be sorted at the beginning:
image.png
Determine the position of the reference point element 3 and the pointers at the left and right ends:
image.png
the element 6 pointed to by the left i is already greater than the reference point 3 and remains unchanged, and the right j moves to the left to find the element 2 smaller than the reference point: exchange
image.png
i , j pointer element:
image.png
the pointer continues to move to find the next set of reverse order pairs:
image.png
exchange i, j pointer element:
image.png
the two pointers continue to move towards each other, and find the partition point:
image.png
at this time, ithe right side of the pointer pointer is all elements greater than or equal to the reference, and jthe pointer On the left are all elements less than or equal to the pivot. jis the position of the last element that is less than or equal to (=) the pivot, which is exactly the partition point we want. Rather, iis the position of the first element greater than or equal to (=) the pivot, so it is not a suitable partition point. Finally, select jthe index at , and determine the point to be the final position of the reference element:
image.png
the left sub-partition starts recursive sorting, and determines the reference point to be 1 and the positions of the pointers at the left and right ends. At this time, the position just meets the reverse order pair condition: directly
image.png
exchange i, j pointers Elements:
image.png
The two pointers continue to move towards each other, find the partition point 1, and determine it as the final reference element position:
image.png
now there is only the right sub-partition, the number of elements is 1, and the index range check condition is not satisfied, no sorting is required, and the reference element 2 is directly determined Position:
image.png
... omits the right subpartition sort pass.

2.3 Hoare vs Lomuto

Through the above understanding, we know that quick sort is a classic sorting algorithm, and one of the key steps is how to partition the array. There are two popular partitioning strategies: Hoare and Lomuto. These two strategies have their own advantages and disadvantages, but in terms of efficiency, Hoare's scheme is usually better.

  1. Efficiency comparison:
    • Hoare Partitioning Scheme: On average, Hoare's partitioning strategy requires only one-third of the exchanges of Lomuto's scheme, which makes it more efficient in practical applications.
    • Lomuto's scheme: Although it may be more intuitive and simple to implement, it may not produce balanced partitions in some cases, especially when all values ​​are equal.
  2. For sorted input:
    • If the first or last element is chosen as the pivot, the performance of quicksort drops to O(n^2), whether using Hoare's or Lomuto's partitioning strategy.
    • However, Hoare's scheme requires almost no commutation operations on the sorted data and can produce partitions of approximately equal size if the middle element is chosen as the pivot. This leads to the best case performance of quicksort, which is O(n log(n)).
  3. stability:
    • Neither Hoare nor Lomuto's partitioning strategy, quicksort is stable. This means that equal elements may change their relative order.
  4. The final location of the benchmark:
    • In Hoare's partitioning strategy, the final location of the base is not necessarily the returned index. This is because the pivot and elements equal to the pivot may appear anywhere in the partition after the partition step, and may not be sorted until the recursion reaches the base case of a single element.

Conclusion: While Lomuto's partitioning strategy may be easier to understand in some teaching scenarios, Hoare's partitioning strategy is often more efficient in practice.

3. Avoid possible instability

As mentioned above, whether it is Hoare or Lomuto's partition strategy, quick sort is not stable. Because equal elements in the original data may change their relative order after sorting. In order to avoid this problem, we can judge whether the two elements being exchanged are equal when performing element exchange, and if they are equal, we can abandon this exchange.

Taking the Lomuto partition scheme as an example, the optimized code is as follows:

/**
 * 快速排序(Lomuto 分区方案)
 */
public class QuickSortLomuto {
    
    

    /**
     * 分区函数,确定基准点的最终位置(索引)
     * @param arr 待分区数组
     * @param low 当前分区下届
     * @param high 当前分区上届
     * @return 基准点的位置(索引)
     */
    private static int partition(int[] arr,int low,int high){
    
    
        // Lomuto 分区方案:选择最右边的元素作为基准点
        int pivot = arr[high];

        // 临时的基准索引 i:用于指向当前小于基准点的元素的位置(循环中,第一步是先自增,然后再使用,所以初值要减 1,从而使得第一次循环时 i 指向的是数组的第一个元素,也就是 j 指向的元素)
        int i = low - 1;
        
        // 小于基准点元素的索引 j:用于遍历数组,找出小于基准点的元素
        for (int j = low; j < high; j++) {
    
    
            // 在 j 指针向右移动的过程中,如果遇到小于基准点的元素
            if(arr[j] <= pivot){
    
    
                // 由于临时的基准索引 i 初始值为 low - 1,所以需要先自增,使其指向第一个元素(也就是 j 指向的元素)再进行交换
                i++;
                // 交换 i 指针(现在指向的是大于基准点的元素)和 j 指针(现在指向的是小于基准点的元素)指向的元素
                // 交换后,i 指针指向的是小于基准点的元素,j 指针指向的是大于基准点的元素
                if(arr[i] != arr[j]){
    
    
                    swap(arr,i,j);
                }
            }
        }

        // 将基准元素索引 i 移动到正确的位置(在较小和较大元素的中间)
        i++;
        // 交换 i 指针(现在指向的是大于基准点的元素)和 high 指针(现在指向的是基准点)指向的元素
        if(arr[i] != arr[high]){
    
    
            swap(arr,i,high);
        }

        // 返回基准点最终的位置(索引)
        return i;
    }

    /**
     * 交换数组中的两个元素
     * @param arr 数组
     * @param i 临时的基准索引
     * @param j 遍历找到的小于基准点的元素的索引位置
     */
    private static void swap(int[] arr,int i,int j){
    
    
        int temp = arr[i];
        arr[i] = arr[j];
        arr[j] = temp;
    }

    /**
     * 快速排序
     * @param arr 待排序数组
     * @param low 数组下届
     * @param high 数组上届
     */
    public static void quickSort(int[] arr,int low,int high){
    
    
        // 递归终止条件(元素个数小于等于 1 个时,无需排序)
        if(low >= high){
    
    
            return;
        }

        // 分区,确定基准点的位置
        int pivot = partition(arr,low,high);

        // 递归调用,对左子数组进行快速排序
        quickSort(arr,low,pivot - 1);

        // 递归调用,对右子数组进行快速排序
        quickSort(arr,pivot + 1,high);
    }

    public static void main(String[] args) {
    
    
        int[] arr = {
    
    3,7,8,5,2,1,9,5,4};
        quickSort(arr,0,arr.length - 1);
        System.out.println(Arrays.toString(arr));
    }
}

4. Small gift

At some point, whether it is learning data structure or algorithm related knowledge, it is inevitable to feel a little abstract. At this time, if there is a tool that can make abstract concepts concrete, that would be great. Here is a website that can meet your needs:

地址:Data Structure Visualization

This website brings together most of the common data structures and algorithms with animations and step-by-step debugging. Using them well can help you better understand the corresponding knowledge points.

For example, common data structures:
image.png
Another example is the quick sort we mentioned above:
image.png
OK, this sharing is here, if you feel that it is helpful to you, remember to click three times! ! !


References:

Guess you like

Origin blog.csdn.net/ly1347889755/article/details/132482340
Recommended