foreword

This article is based on the C language to share a wave of the author's learning experience and experience in the exchange sorting of the sorting algorithm. Due to the limited level, mistakes are inevitable. Welcome to correct and exchange.

quick sort

Quick sorting is a binary tree structure exchange sorting method proposed by Hoare in 1962. Its basic idea is: any element in the sequence of elements to be sorted is taken as the reference value, and the set to be sorted is divided into two parts according to the sorting code. sequence, all elements in the left subsequence are less than the reference value, all elements in the right subsequence are greater than the reference value, and then the leftmost subsequence repeats the process until all elements are arranged in the corresponding position.

// 假设按照升序对array数组中[left, right)区间中的元素进行排序
void QuickSort(int array[], int left, int right)
{
    
    
    if(right - left <= 1)
    	return;
    // 按照基准值对array数组的 [left, right)区间中的元素进行划分
    int div = partion(array, left, right);
    // 划分成功后以div为边界形成了左右两部分 [left, div) 和 [div+1, right)
    // 递归排[left, div)
    QuickSort(array, left, div);
    // 递归排[div+1, right)
    QuickSort(array, div+1, right);
}

The above is the main framework for quick sort recursion implementation. It is found that it is very similar to the pre-order traversal rules of binary trees. When writing the recursive framework, think about the pre-order traversal rules of binary trees and you can quickly write it out. You only need to analyze how to use the reference value to align the intervals. The way the data is divided can be.

There are three common ways to divide the interval into left and right halves according to the reference value.

(The following are all conceived based on ascending order)

1. hoare version

Basic explanation

Select a key value, and then let L and R start from the head and tail of the array respectively, and move to the middle. L finds a value larger than the key, and R finds a value smaller than the key.

hoare

How to ensure that the value of the meeting position is smaller than the value of the key?

Let me talk about how L and R stop: either they encounter a larger value/smaller value, or L and R meet.

Just use the first element on the left as the key, and let the R on the right go first (L and R do not start at the same time, one must go first).

There are two situations:

R stopped when it encountered a small value, and then L walked. L couldn't find a larger value and met R. At this time, the value of the meeting position is smaller than the key.
When L encounters a larger value and stops, the values of R and L positions are exchanged. At the beginning of the next round, note that R will go first, and L is still stopped at this time (the value of the stopped position has changed in the previous exchange. a smaller value), R has been unable to find a smaller value and meets L, and the value at the meeting position is smaller than the key.

Similarly, if you want to use the first element on the right as the key, you need to let the L on the left go first to ensure that the value of the meeting position is greater than the value of the key.

Code

Since we want to exchange array elements, if the key is a temporary variable, the key and the array element cannot be exchanged, so we can use the subscript keyi to do it.

Not only the outer while condition left < right, but also the inner two while conditions, why? In order to prevent missing when finding L and R, it is also used as a judgment condition for L and R to meet. At the same time, we should also pay attention to >= and <=, we should filter out the equal situation, because what we are looking for is a larger value and a smaller value, if we stop when we encounter equal values, it will cause interference.

void Swap(int* px, int* py)
{
    
    
    int tmp = *px;
    *px = *py;
    *py = tmp;
}

int PartSort_1(int* arr, int left, int right)
{
    
    
 	assert(arr);
    
    int keyi = left;
    while(left < right)
    {
    
    
        while(left < right && arr[right] >= arr[keyi])
            --right;
        while(left < right && arr[left] <= arr[keyi])
            ++left;
        if(left < right)//不是因为相遇而停下来才交换，若是相遇就要出循环
            Swap(&arr[left], &arr[right]);
    }
    
    int meeti =  left;
    Swap(&arr[meeti], &arr[keyi]);
    
    return meeti;//返回相遇位置的下标是为了后续分割子区间
}

significance

This is just a one-way sort, what's the point?

The value corresponding to the key has been arranged (a value is arranged).
At the same time, the left and right sub-intervals based on the key are divided. If the subintervals are in order, the whole is in order. This involves the recursion of subproblems.

recursive implementation

In fact, after dividing the left and right sub-intervals, recursion is used to solve the problem, and such a recursive process is described by a binary tree like this:

meeti is the subscript of the position where L and R meet each time, and the left and right sub-intervals are [left, meeti - 1] and [meeti + 1, right]. Be careful not to miss the recursive termination condition: if the left if(left >= right)return;boundary Greater than or equal to the right boundary obviously indicates that the interval is an empty interval or has only one element, and the recursion should not continue, and the previous function should be returned at this time.

Code

void QuickSort(int* arr, int left, int right)
{
    
    
    assert(arr);
    
    if(left >= right)
        return;
    
    int meeti = PartSort_1(arr, left, right);
    
    QuickSort(arr, left, meeti - 1);
    QuickSort(arr, meeti + 1, right);
}

Time complexity: O(nlogn)

Space complexity: O(logn)

Stability: Unstable

The overall comprehensive performance and usage scenarios of quick sort are relatively good, so it dares to be called quick sort.

Inadequacies

It's over when it encounters order or near order.

Because we currently choose the leftmost or rightmost value for the key, which is close to the minimum or maximum, so that there will be a "one-sided" situation during interval division and recursion (because the left side of the key cannot find a value larger than the key , there is no one smaller than the key on the right), resulting in too deep recursive layers and easy stack overflow, the time complexity becomes O(n ² ), and the efficiency is suddenly low.

Optimization of key selection ideas (take the middle of the three numbers)

Then can we optimize the logic of key selection for this ordered or near-ordered situation?

Take the middle of the three numbers, that is, select the value of the middle size from the values of the first position, the middle position and the last position, and change the middle value to the front and continue the sorting . In this way, the maximum or minimum value of the array will not be selected in any way, effectively avoiding the bad results that may be caused by order or close to order.

Write a function to compare three numbers, return the subscript of the middle value, and then exchange the middle value with the first value.

int GetMidIndex(int* arr, int left, int right)
{
    
    
    int mid = (right - left) / 2 + left;
    if(arr[left] > arr[mid])
    {
    
    
        if(arr[mid] > arr[right])
            return mid;
        else if(arr[right] < arr[left])
            return right;
        else
            return left;
    }
    else
    {
    
    
        if(arr[left] > arr[right])
        	return left;
        else if(arr[mid] > arr[right])
            return right;
        else
            return mid;
    }
    
}

int PartSort_1(int* arr, int left, int right)
{
    
    
 	assert(arr);
    int mid = GetMidIndex(arr, left, right);
    Swap(&arr[left], &arr[mid]);
    int keyi = left;
    while(left < right)
    {
    
    
        while(left < right && arr[right] >= arr[keyi])
            --right;
        while(left < right && arr[left] <= arr[keyi])
            ++left;
        if(left < right)//不是因为相遇而停下来才交换，若是相遇就要出循环
            Swap(&arr[left], &arr[right]);
    }
    
    int meeti =  left;
    Swap(&arr[meeti], &arr[keyi]);
    
    return meeti;//返回相遇位置的下标是为了后续分割子区间
}

Optimization between cells

When recursing to a small subinterval, there are many nodes in the recursive binary tree. You can consider using direct insertion sorting for small intervals to reduce the amount of recursion.

According to the nature of the binary tree, the number of nodes in the bottom one, two and three layers accounts for almost 87.5% of the summary points. Each node is a recursion. When the amount of data is large, the amount of recursion in these three layers is It is very large, and the intervals of these three layers are basically small intervals, and the cost of direct sorting is lower than recursion for just a few numbers. Here we take 8 as the benchmark for the number of elements between small areas. As long as it is between small areas, we will abandon recursion and use direct insertion sorting.

void InsertSort(int* arr, int sz)
{
    
    
    assert(arr);
    
    for(int i = 0; i < sz - 1; ++i)
    {
    
    
        int end = i;
        int tmp = arr[end + 1];
        while(end >= 0)
        {
    
    
            if(arr[end] > tmp)
            {
    
    
                arr[end + 1] = arr[end];
                --end;
            }
            else
                break;

        }
        arr[end + 1] = tmp;        
    }
}

void QuickSort(int* arr, int left, int right)
{
    
    
    assert(arr);
    
    if(left >= right)
        return;
    
    if(right - left + 1 <= 8)
    {
    
    
        InsertSort(arr + left, right - left + 1);
    }
    else
    {
    
    
        int meeti = PartSort_1(arr, left, right);
        QuickSort(arr, left, meeti - 1);
        QuickSort(arr, meeti + 1, right);
    }

}

remind

One thing to say, there are more details to pay attention to in this version, and it is easy for unskilled people to write bugs, so it is not recommended to use.

2. Digging method

This is an adapted version of hoare. There is no difference in the general logical framework. The main difference is the difference in the implementation of some details, which may make people understand and accept it better.

In the idea of single-pass sorting, although the leftmost value is still used as the key, L finds a larger value, and R finds a smaller value, but it is not to exchange the two values found by L and R, but to "fill the hole" ". At the beginning, the leftmost value is temporarily stored in the key, and a "pit position" is formed at this time, and then let R go first to find a smaller value. If found, this value is "filled in the pit". The position where the small value is located "dig a hole", right? Until L and R meet, the meeting position at this time is also a "pit position", and then the value of the key is "filled in the pit" to complete the sorting.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-JB5ZhfaD-1662793819275)(https://typora-picture-1313051246.cos.ap-beijing.myqcloud.com /Digging method.gif)]

int PartSort_2(int* arr, int left, int right)
{
    
    
    assert(arr);
    int mid = GetMidIndex(arr, left, right);
    Swap(&arr[left], &arr[mid]);
    int key = arr[left];
    int hole = left;

    while (left < right)
    {
    
    
        while (left < right && arr[right] >= key)
            --right;
        if (left < right)
        {
    
    
            arr[hole] = arr[right];
            hole = right;
        }
        while (left < right && arr[left] <= key)
            ++left;
        if (left < right)
        {
    
    
            arr[hole] = arr[left];
            hole = left;
        }

    }

    arr[hole] = key;
    return hole;
}

Other content remains unchanged, but the idea of single-pass sorting has changed slightly.

void QuickSort(int* arr, int left, int right)
{
    
    
    assert(arr);
    
    if(left >= right)
        return;
    
    if(right - left <= 8)
    {
    
    
        InsertSort(arr + left, right - left + 1);
    }
    else
    {
    
    
        int meeti = PartSort_2(arr, left, right);
        QuickSort(arr, left, meeti - 1);
        QuickSort(arr, meeti + 1, right);
    }

}

3. Back and forth pointer method

This is also an adapted version of hoare. There is no difference in the general logical framework, mainly because of the difference in the implementation of some details.

In the idea of single-pass sorting, set two pointers prev and cur, walk from the left to the right of the array, cur goes first, cur stops when it encounters a smaller value, prev takes a step, and swaps the positions of prev and cur at this time value, then cur continues to walk until cur goes out of the array boundary, and finally exchange the value of the keyi position with the value of the prev position.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-Lh2Dcm7W-1662793819275)(https://typora-picture-1313051246.cos.ap-beijing.myqcloud.com / pointer before and after.gif)]

int PartSort_3(int* arr, int left, int right)
{
    
    
    assert(arr);
    int mid = GetMidIndex(arr, left, right);
    Swap(&arr[left], &arr[mid]);
    int keyi = left;
    int prev = left;
    int cur = left + 1;

    while (cur <= right)
    {
    
    
        if (arr[cur] < arr[keyi])
        {
    
    
            ++prev;
            Swap(&arr[prev], &arr[cur]);
        }

        ++cur;
    }

    Swap(&arr[prev], &arr[keyi]);
    return prev;
}

There can also be a little optimization: when prev and cur are in the same position, the exchange is meaningless. You can make more judgments and omit these exchanges.

int PartSort_3(int* arr, int left, int right)
{
    
    
    assert(arr);
    int mid = GetMidIndex(arr, left, right);
    Swap(&arr[left], &arr[mid]);
    int keyi = left;
    int prev = left;
    int cur = left + 1;
    
    while(cur <= right)
    {
    
    
        if(arr[cur] < arr[keyi] && ++prev != cur)
        	Swap(&arr[prev], &arr[cur]);   
        
        ++cur;
    }
    
    Swap(&arr[keyi], &arr[prev]);
    return prev;
}

Other content remains unchanged, but the idea of single-pass sorting has changed slightly.

void QuickSort(int* arr, int left, int right)
{
    
    
    assert(arr);
    
	if(left >= right)
    	return;

	if(right - left <= 8)
	{
    
    
    	InsertSort(arr + left, right - left + 1);
	}
    else
    {
    
    
        int meeti = PartSort_3(arr, left, right);
        QuickSort(arr, left, meeti - 1);
        QuickSort(arr, meeti + 1, right);
    }
}

Non-recursive implementation

Using recursive implementation may cause stack overflow due to too deep recursion when the amount of data is large. To overcome this defect, non-recursive implementation can be considered, and non-recursive implementation requires a deeper understanding of recursive implementation.

In fact, it is necessary to use a loop to simulate recursion. Here we use a stack to access the left and right boundaries of the interval. The original single-way sorting PartSort sorts according to the incoming left and right boundaries and returns the division points of the left and right subintervals. Point to get the left and right boundaries of the left and right subintervals.

Pushing into the stack is in the order of first left and then right, and popping it is in the order of first right and then left. Push the left and right boundaries of the original array onto the stack at the beginning, and continue as long as the stack is not empty after entering the loop. In essence, it uses the first-in-last-out of the stack, put the right section first and then the left section. When fetching, take out the left section first, PartSort the left section, then push the right section into the stack first, then push the left section into the stack, and so on. When fetching once, it is still the left interval, and so on, does it simulate the first left and then the right when recursive?

Be careful not to miss it if(left >= right)continue;. When the interval is empty or there is only one element, there is no need for PartSort.

void QuickSortNonR(int* arr, int begin, int end)
{
    
    
    assert(arr);

    ST st;
    StackInit(&st);

    int left = begin;
    StackPush(&st, left);

    int right = end;
    StackPush(&st, right);

    while(!StackEmpty(&st))
    {
    
    
        right = StackTop(&st);
        StackPop(&st);
        left = StackTop(&st);
        StackPop(&st);

        if (left >= right)
            continue;

        int keyi = PartSort_1(arr, left, right);
        //先放右区间
        StackPush(&st, keyi + 1);
        StackPush(&st, right);
        //再放左区间
        StackPush(&st, left);
        StackPush(&st, keyi - 1);

    }

    StackDestroy(&st);
}

Thank you for watching, your support is my greatest encouragement~

[C-based sorting algorithm] Quick sort of exchange sort