One of the algorithm series quicksort

Quicksort is probably the most widely used sorting algorithm. The main reason for its popularity is that it is simple to implement and much faster than other algorithms, and more importantly, its two points of in-place sorting make the algorithm only need a small auxiliary stack. Ideally, for an array of length N, the time required is proportional to NlgN.

Quick sort was C.R.A.Hoareproposed in 1962, using the divide and conquer strategy, also called 分治法.

In practical applications, the quicksort algorithm is very fragile and has high requirements for data. Later, I will explain in detail what situations can make quicksort extremely poor performance, and how to optimize it.

Basic algorithm

Quicksort divides the array into two subarrays, sorts them separately, and uses them recursively until each subarray has only one value, and the entire array is sorted.

The general idea of ​​quick sort is divided into three steps:

  1. set a baseline
  2. Put a value smaller than the reference value on the left, and vice versa on the right.
  3. Use the algorithm recursively on the left and right sides until each subarray has only one number

code show as below:

void quick_sort(int *a, int lo, int hi){

    if (hi <= lo) return;  // 递归退出条件
    int j = partition(a, lo, hi); // 用基准数将数组分割,并获取基准数位置
    quick_sort(a, lo, j-1);  // 排序右半部分
    quick_sort(a, j+1, hi);  // 排序右半部分
}

int partition(int *a, int lo, int hi) {

    int s = a[lo];    // 使用第一个数作为基准数
    int i = lo;
    int j = hi + 1;

    while(1){
        while(a[++i] <= s) if (i == hi) break; // 从左到右寻找大于基准数的 数字
        while(a[--j] >= s) if (j == lo) break; // 从右到左寻找小于基准数的 数字

        if ( i >= j ) break; // 寻找结束
        exchange(a, i, j);   // 交换两个数的位置 
    }

    exchange(a, lo, j); // 寻找结束,交换 基准数 和最后一个小于基准数的 数字
    return j; // 返回 基准数 的位置 
}

We use the following diagram to explain the above algorithm steps:

  1. The original data is the first row;
  2. The base value is the first number12
  3. i Scan to the right 1starting , looking for numbers greater than the reference value;
  4. j Scan to the right 11starting , looking for numbers smaller than the reference value;
  5. If found, swap the positions of the two numbers pointed to by i and j;
  6. Repeat steps 2-4;
  7. If the position of i is greater than the position of j, stop repeating;
  8. Swap the base value and the number indicated by j;
  9. Divide the array into two sub-arrays with the base value, repeat steps 2-7

center

There are a few things to note in the above algorithm:

1. Don't cross the line

When using a pointer to operate an array, be careful not to allow the pointer to be accessed out of bounds during the looping process. It is particularly important to judge the boundary.

2. Terminate recursion

Recursive algorithms take great care to formulate the conditions for terminating the recursion so that the algorithm can terminate in a finite loop for any input.

Issues & Improvements

Quicksort has been thoroughly analyzed mathematically, so we can say exactly how it performs.

The inner loop of the split method of quicksort uses an incrementing index to compare the array element with a constant value. This kind of simplicity is hard to see in other algorithms, such as merge sort and hill sort, which are slower than quicksort because they also move data in the loop.

A few places that affect the algorithm:

1. Selection of segmented elements

In the sorting algorithm, the best case is that each split can be divided into two halves, so that the recursive tree is a balanced binary tree. The use of the stack is the least and the efficiency is the highest.

In the worst case, each selection of the reference value can only reduce the divided array by one data, so the recursive call stack will become very deep and waste resources.

In practical applications, how to solve this problem?

Generally, the median is used to determine the benchmark value of segmentation. Generally, three random samplings are used, and the effect of taking the median is better.

int median(int *a, int lo, int hi) {

    int c = 0;
    int x[3];
    int median;

    while(1){

        int tmp = rand()% hi;
        if (tmp <= lo) continue;

        x[c] = tmp;
        if (++c == 3) break;
    }

    int x0 = a[x[0]];
    int x1 = a[x[1]];
    int x2 = a[x[2]];

    if ( x0 >= x1 ) {

        if (x0 >= x2) median = x1 >= x2 ? x[1]:x[2];
        else median = x[0];
    } else {

        if ( x1 >= x2 ) median = x0 >= x2 ? x[0]:x[2];
        else median = x[1];
    }
    
    return median;
}
// 在 parition 函数最开始添加一行    
if( hi < lo + 3) swap(a, lo, median(a, lo, hi));

2. Sorting of small arrays

When dealing with smaller arrays, quicksort is slower than insertion sort. So when dealing with small arrays, switch to insertion sort.

// 替换 quick_sort 函数中: if (hi <= lo) return;

if (hi <= lo + M) { insertion_sort(a, lo, hi); }

// 转换临界值 M 和系统是相关的,一般来说 取值 5-15 能满足大部分情况

3. Repeat element processing

Quicksort uses the bisection method to cut the elements of the array. When there are a large number of identical elements in the array, the performance will not decrease, but there is a huge room for improvement.

EWDijkstra provides a three-way split quicksort method.

The algorithm maintains three pointers: lt, gt, i

  1. Choose a reference value v;
  2. a[lo...lt-1]elements are all less than v;
  3. a[lt...i]elements are all equal to v;
  4. a[i+1...gt]The state of the element is indeterminate;
  5. a[gt...hi]elements are all greater than v;
  6. When grouping, intervals of equal elements are no longer grouped, reducing the number of comparisons
void quick_sort_3way(int *a, int lo, int hi){

    if (hi <= lo) return;

    int lt = lo, i = lo +1, gt = hi;
    int s = a[lo];

    while(i <= gt){
        if (a[i] < s) swap(a, lt++, i++);
        else if (a[i] > s) swap(a, i, gt--);
        else i++;
    }

    quick_sort(a, lo, lt-1);
    quick_sort(a, gt+1, hi);
};

end

Quick sort is relatively common in practical applications, interviews, and written tests, and it is also a sorting algorithm with better application effects. The divide and conquer strategy is also reflected in many other algorithms, and it is an algorithm worth learning.

Insertion sort is used in the article and will be introduced in the next article.


Please indicate the author and source ( reposkeeper ) for reprinting, please do not use for any commercial purposes!

Follow the WeChat public account to get the push of new articles!
qrcode

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324825039&siteId=291194637