【Data structure and algorithm】 Quick sort, random reference value, two-way fast row, three-way fast row

You can see the quick sort animation demo in the QUI tag at https://visualgo.net/zh/sorting.

Quick sort

Average time complexity: O (nlogn), worst case O (n ^ 2)
space complexity: O (1)

Basic idea

Divide and conquer.

Find a reference element and decompose it with the reference element. The left is smaller than the reference element, and the right is larger than the reference element.

This divides an array to be sorted into left and right parts.
Follow the steps above for left and right.

43 5 6 2 1
We chose 4 as the reference value.
1 3 245 6
Operate on [1 3 2] and [5 6].

Quick Row 1

Basic implementation

// 返回 p, arr[l, p - 1] < arr[p]; arr[p + 1, r] > arr[p];
template<typename T>
int __partition(T arr[], int l, int r) {
    T v = arr[l];  // 选取第一个元素为基准值

    int j = l;
    for (int i = l + 1; i <= r; ++i) {
        if (arr[i] < v) {
            swap(arr[++j], arr[i]);
        }
    }

    swap(arr[l], arr[j]);
    return j;
}

template<typename T>
void __quickSort(T arr[], int l, int r) {
    if (l >= r) return;

	// 经过 __partition 后,[l, p-1] < arr[p],[p + 1, r] > arr[p]
    T p = __partition(arr, l, r);      
    __quickSort(arr, l, p - 1);
    __quickSort(arr, p + 1, r);
}

template<typename T>
void quickSort(T arr[], int n) {
    __quickSort(arr, 0, n - 1);
}

10k random numbers [0, 10k] Sort:

归并排序        :       0.001776 s
快速排序        :       0.001609 s

Merge sorting and fast sorting are all O (nlogn) orders of magnitude.

optimization

Random reference value

The above implementation is problematic. If the array to be sorted is close to ordered, we choose the first value as the reference value and divide it into two parts, which will cause a particularly large part. Can be seen by the following test:

10k nearly ordered numbers [0, 10k] Sort:

归并排序        :       0.004252 s
快速排序        :       2.7637 s

At this time, the quick row will degenerate to O (n ^ 2) level.
Quick Row 2
Investigating the reason, it was found that it was the problem of choosing the benchmark number.
So how to choose this benchmark number?

Randomly choose a number as the reference value

Randomly select a number for the first time, the probability that it is the smallest is 1 / n; the
second time is 1 / (n-1);

In this way, every time a small number is selected 1/n * 1/(n-1) * 1/(n-2) * .... Is a very small probability.

template<typename T>
int __partition(T arr[], int l, int r) {
    swap(arr[l], arr[rand() % (r-l+1) + l]);
    T v = arr[l];

    // arr[l + 1, j] < v; arr[j + 1, i] > v;
    int j = l;
    for (int i = l + 1; i <= r; ++i) {
        if (arr[i] < v) {
            swap(arr[++j], arr[i]);
        }
    }

    swap(arr[l], arr[j]);
    return j;
}

// arr[l, r]
template<typename T>
void __quickSort(T arr[], int l, int r) {
    if (l >= r) return;
    srand(time(NULL));

    T p = __partition(arr, l, r);
    __quickSort(arr, l, p - 1);
    __quickSort(arr, p + 1, r);
}

Two-way fast row

After optimization 1 above, quick sorting is already sufficient for sorting in some scenarios.
However, consideration of the problem needs to be comprehensive.
In optimization 1 above, the data is divided into two parts. By default, the value equal to the reference value is divided into half. If the array to be sorted is much equal to the reference value, for example, 100k numbers are [0, 5 ] Number, there will be uneven left and right, resulting in many sides and few sides, so that the speed of fast row will also be degraded.

100k random numbers, range [0, 5]:

归并排序	:	0.015571 s
快速排序	:	2.00576 s

Uneven distribution

How to solve it?
Two-way fast row!

The number equal to the reference value is equally distributed to both sides.

template<typename T>
int __partition2(T arr[], int l, int r) {
    swap(arr[l], arr[rand() % (r - l + 1) + l]);
    T v = arr[l];

    int i = l + 1, j = r;
    while (true) {
        while (i <= r && arr[i] < v) i++;
        while (j >= l + 1 && arr[j] > v) j--;
        if (i > j) break;
        swap(arr[i++], arr[j--]);
    }

    swap(arr[l], arr[j]);
    return j;
}

Two-way fast row, 100k random numbers, range [0, 5]:

归并排序	:	0.015794 s
快速排序	:	0.026956 s

It can be seen that the two-way fast sorting and merging are already in an order of magnitude.

Three-way fast row

Through the above random selection of benchmark values ​​and two-way fast queuing, some problems have been solved, but it is not yet optimal.

100k random numbers in the range [0, 5]. Although the two-way fast sorting is basically divided into two parts, but for a large number of the same number as the reference value, many repetitive operations have been done.

Three-way quick sorting is to divide the array to be sorted into three parts: [less than the reference value, equal to the reference value, and greater than the reference value]. The next round of sorting only needs to sort the two parts that are less than the reference value and greater than the reference value A large part of the operation is omitted.

template <typename T>
void __quickSort3Ways(T arr[], int l, int r) {
    if (l >= r) return ;

    srand(time(NULL));
    swap(arr[l], arr[rand() % (r - l + 1) + l]);
    T v = arr[l];

    // partition
    int i = l + 1;
    int lt = l;
    int gt = r + 1;
    while (i < gt) {
        if (arr[i] < v) {
            swap(arr[++lt], arr[i++]);
        } else if (arr[i] > v) {
            swap(arr[i], arr[--gt]);
        } else {
            ++i;
        }
    }
    swap(arr[l], arr[lt]);

    __quickSort3Ways(arr, l, lt - 1);
    __quickSort3Ways(arr, gt, r);
}

template<typename T>
void quickSort3Ways(T arr[], int n) {
    __quickSort3Ways(arr, 0, n - 1);
}

Three-way fast queue, 100k random numbers, range [0, 5]:

归并排序	:	0.015375 s
双路快速排序	:	0.026594 s
三路快速排序	:	0.00316 s

You can see that there is a significant improvement in the performance of the three-way fast row!

Performance Testing

100k random numbers, range [0, 100k]:

归并排序	:	0.022674 s
双路快速排序	:	0.03437 s
三路快速排序	:	0.037351 s

100k random numbers, range [0, 5]:

归并排序	:	0.015524 s
双路快速排序	:	0.026816 s
三路快速排序	:	0.002967 s

100k nearly ordered numbers, range [0, 100k]:

归并排序	:	0.003909 s
双路快速排序	:	0.020788 s
三路快速排序	:	0.036348 s

EOF

98 original articles have been published · 91 praises · 40,000+ views

Guess you like

Origin blog.csdn.net/Hanoi_ahoj/article/details/105495178