Sorting Algorithm Learning - Quick Sort

foreword

In the last article, we have implemented an O(NlogN)-level merge sort. In this article, we will implement another O(NlogN)-level sorting algorithm, the well-known quick sort .

quicksort

principle

We first find a number in the array, and then use this number as a benchmark and divide it into two parts, one part is smaller than this number, and the other part is larger than this number. After the completion, the number is in the correct position, and then the two arrays on the left and right sides continue to be quickly sorted by recursive method.
Quick sort demo

Code

Picture explanation of the code:
picture explanation

/**
对arrp[L...R]部分进行partition操作
返回整型p,使得arr[L...p-1] < arr[p] && arr[p+1...R] > arr[p]
*/
template <typename T>
int __partition(T arr[], int L, int R)
{

    //取第一个元素为标准
    T v = arr[L];

    //arr[L+1...j] < v && arr[j+1...i) > v
    int j = L;
    for(int i = L + 1; i <= R; i++)
    {
        if(arr[i] < v)
        {
            swap(arr[j+1], arr[i]);
            j++;
        }
    }

    //最后将标准元素放置到中间
    swap(arr[L], arr[j]);

    return j;
}

/**
对arr[L...R]部分进行快速排序
*/
template <typename T>
void __quickSort(T arr[], int L, int R)
{

    //跳出条件
    if(L >= R)
        return;

    int p = __partition(arr, L, R);
    __quickSort(arr, L, p-1);
    __quickSort(arr, p+1, R);
}

template <typename T>
void quickSort(T arr[], int n)
{
    __quickSort(arr, 0, n-1);
}

optimization

Optimization 1.0

We can see that we use the first number as the benchmark, the disadvantage of this is that when the array is almost ordered, the time complexity of quicksort will drop to O(n^2).
In fact, quick sort and merge sort are similar in that they both use recursion to continuously layer the process. The difference is that merge sort stably divides the array into two, so the final number of layers should be:

he org2N

The cutting method of quick sort is not stable, so it leads to lower efficiency in the case of near- order.
Is there any way to solve this problem? Of course the answer is yes. If we change the reference number from the first one to completely random, will it solve the problem of low efficiency in the case of order? Of course, it is still possible after you say random, which is of course, but after completely random Ordering is almost impossible when there is a large amount of data.

Optimize 1.0 to modify the code

First set the random seed and modify the quickSort function:

template <typename T>
void quickSort(T arr[], int n)
{

    //改进1.0,随机选择标准而不是第一个,解决近乎有序情况下的问题
    srand(time(NULL));

    __quickSort(arr, 0, n-1);
}

Then modify the randomly picked number, just add a line of code to randomly pick a number and exchange it with the first number:

/**
对arrp[L...R]部分进行partition操作
返回整型p,使得arr[L...p-1] < arr[p] && arr[p+1...R] > arr[p]
*/
template <typename T>
int __partition(T arr[], int L, int R)
{

    //改进1.0,随机选择一个数
    swap(arr[L], arr[rand()%(R-L+1) + L]);

    //取第一个元素为标准
    T v = arr[L];

    //arr[L+1...j] < v && arr[j+1...i) > v
    int j = L;
    for(int i = L + 1; i <= R; i++)
    {
        if(arr[i] < v)
        {
            swap(arr[j+1], arr[i]);
            j++;
        }
    }

    //最后将标准元素放置到中间
    swap(arr[L], arr[j]);

    return j;
}

Optimization 2.0

The recursive breakout condition can also be modified, such as insertion sort when the array is small. I will not repeat the explanation here. If you are interested, you can see how I modified the merge sort before. Sorting Algorithm Learning - Merge Sort

Optimization 3.0

The near-ordered case has just been solved, but there is another case where when there is a large amount of data of the same size, it will cause our sorting layers to be unbalanced, and our algorithm will degenerate to O(N^2) level algorithm.
In order to solve this problem, we propose a new partition method, which starts scanning from both ends of the array at the same time, stops when the left scan is greater than v, and stops when the right scan is less than v, exchanges the two, and then continues. Because we still choose to swap when equal, there is no situation where one end is too long.
The principle demonstration is shown in the figure:
3.0-1
3.0-2
The code is modified as follows:

/**
对arrp[L...R]部分进行partition操作
返回整型p,使得arr[L...p-1] < arr[p] && arr[p+1...R] > arr[p]
*/
template <typename T>
int __partition2(T arr[], int L, int R)
{

    //改进1.0,随机选择一个数
    swap(arr[L], arr[rand()%(R-L+1) + L]);

    //取第一个元素为标准
    T v = arr[L];

    //arr[L+1...i) < v && arr(j...r] > v i,j初始值保证一开始两个集合都不存在
    int i = L + 1, j = R;

    while(true)
    {
        //扫描
        while(i <= R && arr[i] < v) i++;
        while(j >= L + 1 && arr[j] > v) j--;
        //跳出条件
        if(i > j) break;
        //交换
        swap(arr[i], arr[j]);
        i++;
        j--;
    }

    //最后将标准元素放置到中间
    //最后完成j位于<=v的最后一个,i位于>=v的第一个
    swap(arr[L], arr[j]);

    return j;
}

/**
对arr[L...R]部分进行快速排序
*/
template <typename T>
void __quickSort2(T arr[], int L, int R)
{

    //跳出条件,可以改进
    if(L >= R)
        return;

    int p = __partition2(arr, L, R);
    __quickSort2(arr, L, p-1);
    __quickSort2(arr, p+1, R);
}

template <typename T>
void quickSort2(T arr[], int n)
{

    //改进1.0,随机选择标准而不是第一个,解决近乎有序情况下的问题
    srand(time(NULL));

    __quickSort2(arr, 0, n-1);
}

Three-way quick row

principle

In fact, this can be said to optimize 4.0, because the second quicksort method optimization 3.0 is already very perfect. But the three-way quicksort is worth mentioning separately, because it is faster when dealing with large amounts of data of the same size. At the same time, for example, the built-in sorting algorithm implementation of some languages ​​(such as Java ) is also three-way quicksort. Three-way quick row, which three-way is faster? ? Of course, the top lane (less than v), the middle lane (equal to v), and the bottom lane (greater than v) are faster together. Our initial quick sort actually makes sense that there are two paths, one is less than v, the other is actually greater than or equal to v, or the position of the equal sign is changed. The problem with this is that we often process data of the same size repeatedly, which reduces the efficiency of the entire algorithm. The three-way quicksort takes out the data equal to v separately, so that we only need to deal with the two kinds of data greater than and less than v. This is the intermediate state, the index i keeps advancing backwards, while maintaining three indexes lt, gt, i : the final state, the sorting is completed after the exchange:



Intermediate state

last state

Code

/**
三路快排处理arr[L...R]
将arr[L...R]分为<v; ==v; >v三部分
之后递归对<v ; >v两部分继续进行三路快速排序
*/
template <typename T>
void __quickSort3Ways(T arr[], int L, int R) {

    //跳出条件
    if(R - L <= 15) {
        insertionSort(arr, L, R);
        return;
    }

    //partition
    //随机取标准
    swap(arr[L], arr[rand()%(R-L+1) + L]);
    T v = arr[L];

    int lt = L; //arr[L+1...lt] < v
    int gt = R + 1; //arr[gt...R] > v
    int i = L + 1; // arr[lt...i) == v

    while(i < gt) {
        if(arr[i] < v) {
            swap(arr[i], arr[lt+1]);
            lt++;
            i++;
        }
        else if(arr[i] > v) {
            swap(arr[i], arr[gt-1]);
            gt--;
        }
        else{
            //arr[i] == v
            i++;
        }
    }

    swap(arr[L], arr[lt]);

    __quickSort3Ways(arr, L, lt-1);
    __quickSort3Ways(arr, gt, R);
}

template <typename T>
void quickSort3Ways(T arr[], int n) {

    srand(time(NULL));
    __quickSort3Ways(arr, 0, n-1);
}

Of course, there is no limit to optimization, so I will introduce it here first! ! Exchange is welcome.


Picture reference Baidu picture
Code implementation reference liuyubobobo MOOC tutorial

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325611934&siteId=291194637