Sorting algorithm with time complexity O(nlogn) | JD Logistics Technology Team

merge sort

Merge sort follows the idea of ​​divide and conquer : decompose the original problem into several smaller but similar sub-problems, solve these sub-problems recursively, and then merge the solutions of these sub-problems to create a solution to the original problem. Merge sort The steps are as follows:

  • Division : Decompose the sequence of n elements to be sorted into two subsequences with n/2 elements each, convert the long array sorting problem into the short array sorting problem, when the length of the sequence to be sorted is 1, recursively divide Finish

  • Merge : Merge two sorted subsequences to produce a sorted final result

The code implementation of merge sort is as follows:

    private void sort(int[] nums, int left, int right) {
        if (left >= right) {
            return;
        }

        // 划分
        int mid = left + right >> 1;
        sort(nums, left, mid);
        sort(nums, mid + 1, right);
        // 合并
        merge(nums, left, mid, right);
    }

    private void merge(int[] nums, int left, int mid, int right) {
        // 辅助数组
        int[] temp = Arrays.copyOfRange(nums, left, right + 1);

        int leftBegin = 0, leftEnd = mid - left;
        int rightBegin = leftEnd + 1, rightEnd = right - left;
        for (int i = left; i <= right; i++) {
            if (leftBegin > leftEnd) {
                nums[i] = temp[rightBegin++];
            } else if (rightBegin > rightEnd || temp[leftBegin] < temp[rightBegin]) {
                nums[i] = temp[leftBegin++];
            } else {
                nums[i] = temp[rightBegin++];
            }
        }
    }

The most attractive property of merge sort is that it guarantees that the time required to sort an array of length n is proportional to nlogn; its main disadvantage is that the additional space required is proportional to n.

Algorithm characteristics:

  • Space complexity : The merging is implemented with the help of auxiliary arrays, using O(n) extra space; the recursion depth is logn, and the stack frame space of O(logn) size is used. Ignore the low-order part, so the space complexity is O(n)

  • Non-in-place sorting

  • stable sorting

  • non-adaptive sorting

The above code is a common implementation of merge sort. Let’s take a look at the optimization strategy of merge sort:

Convert the overhead of creating a small array multiple times to creating a large array only once

In the above implementation, every time we merge two sorted arrays, even if it is a small array, we will create a new temp[] array. This time-consuming part is the main part of the merge sort running time. A better solution is to define the temp[] array as a local variable of the sort() method and pass it as a parameter to the merge() method, as follows:

    private void sort(int[] nums, int left, int right, int[] temp) {
        if (left >= right) {
            return;
        }

        // 划分
        int mid = left + right >> 1;
        sort(nums, left, mid, temp);
        sort(nums, mid + 1, right, temp);
        // 合并
        merge(nums, left, mid, right, temp);
    }

    private void merge(int[] nums, int left, int mid, int right, int[] temp) {
        System.arraycopy(nums, left, temp, left, right - left + 1);
        int l = left, r = mid + 1;
        for (int i = left; i <= right; i++) {
            if (l > mid) {
                nums[i] = temp[r++];
            } else if (r > right || temp[l] < temp[r]) {
                nums[i] = temp[l++];
            } else {
                nums[i] = temp[r++];
            }
        }
    }

When the array is sorted, skip the merge() method

We can add a judgment condition before performing the merge: if nums[mid] <= nums[mid + 1]we think the array is already sorted, then we skip the merge() method. It does not affect the recursive call of sorting, but the running time of the algorithm for any ordered subarray becomes linear. The code is implemented as follows:

    private void sort(int[] nums, int left, int right, int[] temp) {
        if (left >= right) {
            return;
        }

        // 划分
        int mid = left + right >> 1;
        sort(nums, left, mid, temp);
        sort(nums, mid + 1, right, temp);
        // 合并
        if (nums[mid] > nums[mid + 1]) {
            merge(nums, left, mid, right, temp);
        }
    }

    private void merge(int[] nums, int left, int mid, int right, int[] temp) {
        System.arraycopy(nums, left, temp, left, right - left + 1);
        int l = left, r = mid + 1;
        for (int i = left; i <= right; i++) {
            if (l > mid) {
                nums[i] = temp[r++];
            } else if (r > right || temp[l] < temp[r]) {
                nums[i] = temp[l++];
            } else {
                nums[i] = temp[r++];
            }
        }
    }

Using insertion sort on small subarrays

Sorting small-scale arrays will make recursive calls too frequent, and using insertion sort to process small-scale subarrays can generally shorten the running time of merge sort by 10% ~ 15%. The code is implemented as follows:

    /**
     * M 取值在 5 ~ 15 之间大多数情况下都能令人满意
     */
    private final int M = 9;

    private void sort(int[] nums, int left, int right) {
        if (left + M >= right) {
            // 插入排序
            insertSort(nums);
            return;
        }

        // 划分
        int mid = left + right >> 1;
        sort(nums, left, mid);
        sort(nums, mid + 1, right);
        // 合并
        merge(nums, left, mid, right);
    }

    /**
     * 插入排序
     */
    private void insertSort(int[] nums) {
        for (int i = 1; i < nums.length; i++) {
            int base = nums[i];

            int j = i - 1;
            while (j >= 0 && nums[j] > base) {
                nums[j + 1] = nums[j--];
            }
            nums[j + 1] = base;
        }
    }

    private void merge(int[] nums, int left, int mid, int right) {
        // 辅助数组
        int[] temp = Arrays.copyOfRange(nums, left, right + 1);

        int leftBegin = 0, leftEnd = mid - left;
        int rightBegin = leftEnd + 1, rightEnd = right - left;
        for (int i = left; i <= right; i++) {
            if (leftBegin > leftEnd) {
                nums[i] = temp[rightBegin++];
            } else if (rightBegin > rightEnd || temp[leftBegin] < temp[rightBegin]) {
                nums[i] = temp[leftBegin++];
            } else {
                nums[i] = temp[rightBegin++];
            }
        }
    }


Quick sort

Quick sort also follows the idea of ​​divide and conquer . It is different from merge sort in that quick sort sorts in place , and quick sort sorts the current array first, and then sorts the sub-arrays. Its algorithm steps are as follows:

  • Sentinel division : select the leftmost element in the array as the base number, place elements smaller than the base number to the left of the base number, and place elements greater than the base number to the right of the base number

  • Sorting subarrays : Use the index of the sentinel division as the boundary to divide the left and right subarrays, and perform sentinel division and sorting of the left and right subarrays respectively.

The code for quick sort is implemented as follows:

    private void sort(int[] nums, int left, int right) {
        if (left >= right) {
            return;
        }

        // 哨兵划分
        int partition = partition(nums, left, right);

        // 分别排序两个子数组
        sort(nums, left, partition - 1);
        sort(nums, partition + 1, right);
    }

    /**
     * 哨兵划分
     */
    private int partition(int[] nums, int left, int right) {
        // 以 nums[left] 作为基准数,并记录基准数索引
        int originIndex = left;
        int base = nums[left];

        while (left < right) {
            // 从右向左找小于基准数的元素
            while (left < right && nums[right] >= base) {
                right--;
            }
            // 从左向右找大于基准数的元素
            while (left < right && nums[left] <= base) {
                left++;
            }
            swap(nums, left, right);
        }
        // 将基准数交换到两子数组的分界线
        swap(nums, originIndex, left);

        return left;
    }

    private void swap(int[] nums, int left, int right) {
        int temp = nums[left];
        nums[left] = nums[right];
        nums[right] = temp;
    }

Algorithm characteristics:

  • Time complexity : The average time complexity is O(nlogn), and the worst time complexity is O(n2)

  • Space complexity : In the worst case, the recursion depth is n, so the space complexity is O(n)

  • Sort in place

  • Unstable sorting

  • adaptive sorting

The time complexity of merge sort has always been O(nlogn), while the time complexity of quick sort is O(n2) in the worst case. Why is merge sort not as widely used as quick sort?

Answer: Because merge sort is a non-in-place sort, it requires a very large amount of extra space during the merge phase.

Quick sort has many advantages, but when the sentinel division is unbalanced, the efficiency of the algorithm will be relatively inefficient. Here are some ways to optimize quicksort sorting:

Switch to insertion sort

For small arrays, quick sort is slower than insertion sort. The sort() method of quick sort is also called once in a subarray of length 1. Therefore, switching to insertion sort when sorting a small array will be more efficient, as follows :

    /**
     * M 取值在 5 ~ 15 之间大多数情况下都能令人满意
     */
    private final int M = 9;

    public void sort(int[] nums, int left, int right) {
        // 小数组采用插入排序
        if (left + M >= right) {
            insertSort(nums);
            return;
        }

        int partition = partition(nums, left, right);
        sort(nums, left, partition - 1);
        sort(nums, partition + 1, right);
    }

    /**
     * 插入排序
     */
    private void insertSort(int[] nums) {
        for (int i = 1; i < nums.length; i++) {
            int base = nums[i];

            int j = i - 1;
            while (j >= 0 && nums[j] > base) {
                nums[j + 1] = nums[j--];
            }
            nums[j + 1] = base;
        }
    }

    private int partition(int[] nums, int left, int right) {
        int originIndex = left;
        int base = nums[left];

        while (left < right) {
            while (left < right && nums[right] >= base) {
                right--;
            }
            while (left < right && nums[left] <= base) {
                left++;
            }
            swap(nums, left, right);
        }
        swap(nums, left, originIndex);

        return left;
    }

    private void swap(int[] nums, int left, int right) {
        int temp = nums[left];
        nums[left] = nums[right];
        nums[right] = temp;
    }

Benchmark optimization

If the array is in reverse order and the leftmost element is selected as the base number, then each sentinel division will cause the length of the right array to be 0, which will make the time complexity of quick sort O(n2). In order to avoid this situation as much as possible , we can optimize the selection of the benchmark number and use the three-sampling segmentation method: select the median of the three values ​​​​of the leftmost, middle and right end of the array as the benchmark number. The benchmark number selected in this way has a high probability of not being an interval. Extreme value, the probability that the time complexity is O(n2) is greatly reduced. The code is implemented as follows:

    public void sort(int[] nums, int left, int right) {
        if (left >= right) {
            return;
        }

        // 基准数优化
        betterBase(nums, left, right);

        int partition = partition(nums, left, right);

        sort(nums, left, partition - 1);
        sort(nums, partition + 1, right);
    }

    /**
     * 基准数优化,将 left, mid, right 这几个值中的中位数换到 left 的位置
     * 注意其中使用了异或运算进行条件判断
     */
    private void betterBase(int[] nums, int left, int right) {
        int mid = left + right >> 1;

        if ((nums[mid] < nums[right]) ^ (nums[mid] < nums[left])) {
            swap(nums, left, mid);
        } else if ((nums[right] < nums[left]) ^ (nums[right] < nums[mid])) {
            swap(nums, left, right);
        }
    }

    private int partition(int[] nums, int left, int right) {
        int originIndex = left;
        int base = nums[left];

        while (left < right) {
            while (left < right && nums[right] >= base) {
                right--;
            }
            while (left < right && nums[left] <= base) {
                left++;
            }
            swap(nums, left, right);
        }
        swap(nums, originIndex, left);

        return left;
    }

    private void swap(int[] nums, int left, int right) {
        int temp = nums[left];
        nums[left] = nums[right];
        nums[right] = temp;
    }

three-way segmentation

In the case where the array has a large number of repeated elements, the recursive nature of quick sort will cause subarrays with all repeated elements to appear frequently, and quick sorting of these arrays is not necessary and we can optimize it.

A simple idea is to divide the array into three parts, corresponding to the arrays that are less than, equal to, and greater than the benchmark number. Each time the "less than" and "greater than" arrays are sorted, then the sorted result can finally be obtained. Under this strategy, we will not sort the subarrays equal to the reference number, which improves the efficiency of the sorting algorithm. Its algorithm flow is as follows:

Traverse the array from left to right, maintain the pointer l so that the elements in [left, l - 1] are less than the reference number, maintain the pointer r so that the elements in [r + 1, right] are greater than the reference number, maintain the pointer mid so that [ The elements in l, mid - 1] are all equal to the reference number, and the elements in the interval [mid, r] have not yet determined their size relationship. The diagram is as follows:

Quick sort-Netherlands flag.jpg

Its code is implemented as follows:

    public void sort(int[] nums, int left, int right) {
        if (left >= right) {
            return;
        }

        // 三向切分
        int l = left, mid = left + 1, r = right;
        int base = nums[l];
        while (mid <= r) {
            if (nums[mid] < base) {
                swap(nums, l++, mid++);
            } else if (nums[mid] > base) {
                swap(nums, mid, r--);
            } else {
                mid++;
            }
        }

        sort(nums, left, l - 1);
        sort(nums, r + 1, right);
    }

    private void swap(int[] nums, int left, int right) {
        int temp = nums[left];
        nums[left] = nums[right];
        nums[right] = temp;
    }

This is also the classic Dutch flag problem, because it's like sorting an array with three possible primary key values, which correspond to the three colors on the Dutch flag.


giant's shoulders

Author: JD Logistics Wang Yilong

Source: JD Cloud Developer Community Ziyuanqishuo Tech Please indicate the source when reprinting

OpenAI opens ChatGPT to all users for free. Voice programmers tampered with ETC balances and embezzled more than 2.6 million yuan a year. Spring Boot 3.2.0 was officially released. Google employees criticized the big boss after leaving the company. He was deeply involved in the Flutter project and formulated HTML-related standards. Microsoft Copilot Web AI will be Officially launched on December 1st, supporting Chinese Microsoft's open source Terminal Chat Rust Web framework Rocket releases v0.5: supports asynchronous, SSE, WebSockets, etc. The father of Redis implements the Telegram Bot framework using pure C language code . If you are an open source project maintainer, encounter How far can you endure this kind of response? PHP 8.3 GA
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10277308