Data Structures and Algorithms (5) Sorting Algorithms

image-20220821224607720

Sorting algorithm

Congratulations to all my friends for coming to the last part: Sorting Algorithms . The study of data structures and algorithms is coming to an end. Persistence is victory!

The data in an array is originally messy, but due to needs, we need to arrange it in order. To sort the array, we have previously explained bubble sort and quick sort (optional) in the C language programming article. , and in this part, we will continue to explain more types of sorting algorithms.

Before we begin, let’s start with bubble sort.

Basic sorting

Bubble Sort

Bubble sorting has been explained in the C language programming chapter. The core of bubble sorting is exchange. Through continuous exchange, large elements are pushed to one end bit by bit. In each round, the largest element will be arranged. to the corresponding position, and finally form order. Algorithm demonstration website: https://visualgo.net/zh/sorting?slide=2-2

Assume the array length is N, the detailed process is:

  • A total of N rounds of sorting are performed.
  • Each round of sorting starts from the leftmost element of the array and compares the two elements. If the element on the left is greater than the element on the right, then the positions of the two elements are exchanged, otherwise they remain unchanged.
  • Each round of sorting will push the largest of the remaining elements to the far right, and the next sorting will no longer consider these elements that are already at the corresponding position.

For example, the following array:

image-20220904212453328

Then in the first round of sorting, first compare the first two elements:

image-20220904212608834

We find that the former is larger, so we need to exchange it at this time. After the exchange, continue to compare the next two elements backward:

image-20220904212637156

We find that the latter is larger and unchanged, continue to look at the last two:

image-20220904212720898

At this time, the former is larger, swap, and continue to compare subsequent elements:

image-20220904212855292

Is the latter larger, continue to exchange, and then compare backwards:

image-20220904212942212

The latter is still larger. We find that as long as it is the largest element, it will be thrown back in every comparison:

image-20220904213034375

Finally, the largest element in the current array is thrown to the front, and this round of sorting is over. Because the largest element has been placed in the corresponding position, in the second round we only need to consider the elements before it, that is Can:

image-20220904213115671

In this way, we can continue to throw the largest one to the far right. After the last N rounds of sorting, we will have an ordered array.

The program code is as follows:

void bubbleSort(int arr[], int size){
    
    
    for (int i = 0; i < size; ++i) {
    
    
        for (int j = 0; j < size - i - 1; ++j) {
    
    
            //注意需要到N-1的位置就停止,因为要比较j和j+1
            //这里减去的i也就是已经排好的不需要考虑了
            if(arr[j] > arr[j + 1]) {
    
       //如果后面比前面的小,那么就交换
                int tmp = arr[j];
                arr[j] = arr[j + 1];
                arr[j + 1] = tmp;
            }
        }
    }
}

It's just that this code is still the most primitive bubble sort, and we can optimize it:

  1. In fact, sorting does not require N rounds, but N-1 rounds. Because only one element is unsorted in the last round, it is equivalent to being sorted, so there is no need to consider it again.
  2. If there is no exchange in the entire round of sorting, it means that the array is already in order, and there is no situation where the previous one is larger than the last one.

So, let’s improve it:

void bubbleSort(int arr[], int size){
    
    
    for (int i = 0; i < size - 1; ++i) {
    
       //只需要size-1次即可
        _Bool flag = 1;   //这里使用一个标记,默认为1表示数组是有序的
        for (int j = 0; j < size - i - 1; ++j) {
    
    
            if(arr[j] > arr[j + 1]) {
    
    
                flag = 0;    //如果发生交换,说明不是有序的,把标记变成0
                int tmp = arr[j];
                arr[j] = arr[j + 1];
                arr[j + 1] = tmp;
            }
        }
        if(flag) break;   //如果没有发生任何交换,flag一定是1,数组已经有序,所以说直接结束战斗
    }
}

In this way, we have finished writing an optimized version of bubble sort.

Of course, in the end we need to introduce an additional concept: the stability of sorting , so what is stability? If the order of two elements of the same size remains unchanged before and after sorting, the sorting algorithm is stable. The bubble sort we just introduced will only perform the exchange if the former is greater than the latter, so it will not affect the order of the two originally equal elements. Therefore, bubble sort is a stable sorting algorithm .

insertion sort

Let's introduce a new sorting algorithm, insertion sorting, which should be called direct insertion sorting to be precise. Its core idea is just like when we play Landlord.

image-20220904214541199

I believe you all have played it. Before the start of each round of the game, we have to draw cards from the card pile. After drawing the cards, the order of the cards in our hands may be chaotic. This will definitely not work. There are no cards. Straighten out how do we know which cards have how many? In order to make it orderly, we will insert the newly drawn cards into the corresponding positions according to the order of the cards, so that we do not have to sort out the cards in our hands later.

Insertion sort actually has the same principle. By default, the previous cards are already sorted (only the first card is in order at the beginning), and we will traverse the remaining parts next to each other, and then insert them. Go to the corresponding position in front, animation demonstration address: https://visualgo.net/zh/sorting

Assume the array length is N, the detailed process is:

  • A total of N rounds of sorting are performed.
  • Each round of sorting will select an element from the back and compare it with the previously sorted elements from back to front until an element no larger than the current element is encountered, and the current element will be inserted in front of this element.
  • After an element is inserted, all subsequent elements are moved back one position.
  • When all subsequent elements are traversed and inserted into the corresponding positions, the sorting is completed.

For example, the following array:

image-20220904212453328

At this point we assume that the first element is already in order, and we start looking at the second element:

image-20220904221510897

Take it out, and compare it with the previous ordered sequence from back to front. The first comparison is 4, and it is found to be smaller than 4. Continue moving forward, and you find that you have reached the end, so you can just put it at the front. Note Before bringing it to the front, move subsequent elements back to make space:

image-20220904221648492

Then insert it:

image-20220904221904359

Now that the first two elements are in an ordered state, let's continue to look at the third element:

image-20220904221938583

Still looking from the back to the front, we found that we encountered 7 and 4 when we came up, so we put it directly at this position:

image-20220904222022949

Now that the first three elements are all in order, let's continue to look at the fourth element:

image-20220904222105375

Comparing forward in turn, we found that no element smaller than 1 was found at the end, so we moved all the first three elements back:

image-20220904222145903

Insert 1 into the corresponding position:

image-20220904222207544

Now that the first four elements are all in an ordered state, we only need to complete the traversal of subsequent elements in the same way. The final result is an ordered array. Let's try to write some code:

void insertSort(int arr[], int size){
    
    
    for (int i = 1; i < size; ++i) {
    
       //从第二个元素开始看
        int j = i, tmp = arr[i];   //j直接变成i,因为前面的都是有序的了,tmp相当于是抽出来的牌暂存一下
        while (j > 0 && arr[j - 1] > tmp) {
    
       //只要j>0并且前一个还大于当前待插入元素,就一直往前找
            arr[j] = arr[j - 1];   //找的过程中需要不断进行后移操作,把位置腾出来
            j--;
        }
        arr[j] = tmp;  //j最后在哪个位置,就是是哪个位置插入
    }
}

Of course, this code can also be improved, because we spend too much time comparing each one to find the insertion position, because the previous part of the elements is already in an ordered state, we can consider using the binary search algorithm to find the corresponding insertion position. , which saves time in finding the insertion point:

int binarySearch(int arr[], int left, int right, int target){
    
    
    int mid;
    while (left <= right) {
    
    
        mid = (left + right) / 2;
        if(target == arr[mid]) return mid + 1;   //如果插入元素跟中间元素相等,直接返回后一位
        else if (target < arr[mid])  //如果大于待插入元素,说明插入位置肯定在左边
            right = mid - 1;   //范围划到左边
        else   
            left = mid + 1;   //范围划到右边
    }
    return left;   //不断划分范围,left也就是待插入位置了
}

void insertSort(int arr[], int size){
    
    
    for (int i = 1; i < size; ++i) {
    
    
        int tmp = arr[i];
        int j = binarySearch(arr, 0, i - 1, tmp);   //由二分搜索来确定插入位置
        for (int k = i; k > j; k--) arr[k] = arr[k - 1];   //依然是将后面的元素后移
        arr[j] = tmp;
    }
}

Finally, let’s discuss the stability of the insertion sort algorithm. Then the insertion sort without optimization actually keeps looking forward to find an element that is no larger than the element to be inserted. Therefore, when an equal element is encountered, it will only be inserted after it, and the original order of the same elements will not be changed. So It is said that insertion sort is also a stable sorting algorithm (but it becomes unstable after binary search optimization is used later. For example, if there are two consecutive equal elements in an ordered array, and now another equal element comes, the middle one is just found. It is the equal element ranked first, and returns to the next position. The newly inserted element will push the equal element originally ranked second to the back.)

selection sort

Let’s take a look at the last selection sorting (to be precise, it should be direct selection sorting). This sorting is also easier to understand. We just go to the back every time to find the smallest one and put it in the front. Algorithm demonstration website: https: //visualgo.net/en/sorting

Assume the array length is N, the detailed process is:

  • A total of N rounds of sorting are performed.
  • Each round of sorting will find the smallest element from all subsequent elements, and then exchange it with the next position that has been sorted.
  • After N rounds of exchange, an ordered array is obtained.

For example, the following array:

image-20220904212453328

The first sort requires finding the smallest element in the entire array and exchanging it with the first element:

image-20220905141347927

After the exchange, the first element is already in order, and we continue to find the smallest one from the remaining elements:

image-20220905141426011

At this time, 2 happens to be in the second position. Let's pretend to swap it so that the first two elements are already in order. Let's look at the rest:

image-20220905141527050

At this time, it is found that 3 is the smallest, so it is directly swapped to the third element position:

image-20220905141629207

In this way, the first three elements are all ordered. By continuously exchanging in this way, the array we finally get is an ordered array. Let's try to write some code:

void selectSort(int arr[], int size){
    
    
    for (int i = 0; i < size - 1; ++i) {
    
       //因为最后一个元素一定是在对应位置上的,所以只需要进行N - 1轮排序
        int min = i;   //记录一下当前最小的元素,默认是剩余元素中的第一个元素
        for (int j = i + 1; j < size; ++j)   //挨个遍历剩余的元素,如果遇到比当前记录的最小元素还小的元素,就更新
            if(arr[min] > arr[j])
                min = j;
        int tmp = arr[i];    //找出最小的元素之后,开始交换
        arr[i] = arr[min];
        arr[min] = tmp;
    }
}

Of course, we can also optimize selection sorting, because every time we need to select the smallest one, we might as well select the largest one, throw the small one to the left, and throw the big one to the right, so that we can have double Double the efficiency is achieved.

void swap(int * a, int * b){
    
    
    int tmp = *a;
    *a = *b;
    *b = tmp;
}

void selectSort(int arr[], int size){
    
    
    int left = 0, right = size - 1;   //相当于左端和右端都是已经排好序的,中间是待排序的,所以说范围不断缩小
    while (left < right) {
    
    
        int min = left, max = right;
        for (int i = left; i <= right; i++) {
    
    
            if (arr[i] < arr[min]) min = i;   //同时找最小的和最大的
            if (arr[i] > arr[max]) max = i;
        }
        swap(&arr[max], &arr[right]);   //这里先把大的换到右边
        //注意大的换到右边之后,有可能被换出来的这个就是最小的,所以说需要判断一下
        //如果遍历完发现最小的就是当前右边排序的第一个元素
        //此时因为已经被换出来了,所以说需要将min改到换出来的那个位置
        if (min == right) min = max;
        swap(&arr[min], &arr[left]);   //接着把小的换到左边
        left++;    //这一轮完事之后,缩小范围
        right--;
    }
}

Finally, let's analyze the stability of selection sorting. First, selection sorting selects the smallest one each time. When inserting forward, the exchange operation will be performed directly. For example, the original sequence is 3,3,1, and 1 is selected at this time. It is the smallest element and is exchanged with the first 3. After the exchange, the 3 originally ranked first goes to the end, destroying the original order. Therefore, selection sorting is an unstable sorting algorithm .

Let’s summarize the three sorting algorithms we learned above. Assume that the length of the array to be sorted is n:

  • Bubble sort (optimized version):
    • Best case time complexity: O ( n ) O(n)O ( n ) , if it is ordered, then we only need one traversal. When the mark detects that no exchange has occurred, it will end directly, so it can be done once.
    • Worst case time complexity: O ( n 2 ) O(n^2)O ( n2 ), that is to say, every round is filled up abruptly, such as a completely reversed array.
    • **Space complexity:** Because only one variable is needed to temporarily store the variables that need to be exchanged, the space complexity is O ( 1 ) O (1)O(1)
    • **Stability: **Stable
  • Insertion sort:
    • Best case time complexity: O ( n ) O(n)O ( n ) , if it is ordered, because the insertion position is also the same position, when the array itself is ordered, we do not need to change any other elements in each round.
    • Worst case time complexity: O ( n 2 ) O(n^2)O ( n2 ), for example, a completely reversed array will be like this. Every round, you have to completely find the front insertion.
    • Space complexity : Only one variable is needed to store the extracted elements, so the space complexity is O ( 1 ) O (1)O(1)
    • **Stability: **Stable
  • Select sort:
    • Best case time complexity: O ( n 2 ) O(n^2)O ( n2 ), even if the array itself is ordered, each round still has to find the remaining parts one by one before the smallest element can be determined, so the square order is still required.
    • Worst case time complexity: O ( n 2 ) O(n^2)O ( n2 ), no need to say more.
    • Space complexity : Each round only needs to record the smallest element position, so the space complexity is O ( 1 ) O (1)O(1)
    • **Stability: **Unstable

The table is as follows, please remember:

Sorting Algorithm best case scenario worst case scenario space complexity stability
Bubble Sort O ( n ) O(n)O ( n ) O ( n 2 ) O(n^2)O ( n2) O ( 1 ) O(1)O(1) Stablize
insertion sort O ( n ) O(n)O ( n ) O ( n 2 ) O(n^2)O ( n2) O ( 1 ) O(1)O(1) Stablize
selection sort O ( n 2 ) O(n^2)O ( n2) O ( n 2 ) O(n^2)O ( n2) O ( 1 ) O(1)O(1) unstable

Advanced sorting

Earlier we introduced three basic sorting algorithms, and their average time complexity has reached O ( n 2 ) O(n^2)O ( n2 ), then can a faster sorting algorithm be found? In this part, we will continue to introduce the advanced versions of the previous three sorting algorithms.

Quick sort

In the C language programming chapter, we also introduced quick sort. Quick sort is an advanced version of bubble sort. In bubble sort, the comparison and exchange of elements are performed between adjacent elements. Each element Swapping can only move one position, so the number of comparisons and moves is high, and the efficiency is relatively low. In quick sort, the comparison and exchange of elements are performed from both ends to the middle. The larger elements can be exchanged to the later positions in one round, while the smaller elements can be exchanged to the front positions in one round. Each move moves farther, so there are fewer comparisons and moves, and just like its name, it's faster.

In fact, the purpose of each round of quick sort is to throw the big ones to the right of the benchmark and the small ones to the left of the benchmark.

Assume the array length is N, the detailed process is:

  • At the beginning, the sorting range is the entire array
  • Before sorting, we select the first element in the entire sorting range as the basis and quickly sort the elements in the sorting range.
  • First look from the rightmost to the left, and compare each element with the reference element in turn. If it is found to be smaller than the reference element, then exchange it with the element at the left traversal position (the position of the reference element at the beginning), and retain it at this time The current traversed position on the right.
  • After the exchange, the element is traversed from left to right. If it is found to be larger than the base element, it is exchanged with the element at the previously reserved right traversed position. The current position on the left is also retained, and the previous step is executed in a loop.
  • When the left and right traversals collide, this round of quick sort is completed, and the final position in the middle is the position of the base element.
  • Taking the reference position as the center, divide the left and right sides, and perform quick sort in the same way.

For example, the following array:

image-20220904212453328

First, we select the first element 4 as the base element. Initially, the left and right pointers are located at both ends:

image-20220905210056432

At this time, start looking from right to left until you encounter an element smaller than 4. The first is 6, which is definitely not the case. Move the pointer back:

image-20220905210625181

At this time, continue to compare 3 and 4, and find that it is smaller than 4. Then directly exchange 3 (in fact, just overwrite it directly) to the element position pointed to by the left pointer:

image-20220905210730105

At this point we turn to look from left to right. If we encounter an element larger than 4, we swap it to the pointer on the right. 3 is definitely not there anymore, because it just slowed down, and then there is 2:

image-20220905210851474

2 is not as big as 4, so if we continue to look back, 7 is larger than 4 at this time, so continue to exchange:

image-20220905211300102

Then, he started looking from right to left again:

image-20220905211344027

At this time, 5 is larger than 4. If we continue to move forward, we find that 1 is smaller than 4, so we continue to exchange:

image-20220905211427939

Then it turns to look from left to right. At this time, the two pointers collide, the sorting ends, and the positions pointed by the last two pointers are the positions of the base elements:

image-20220905211543845

After this round of quick sort, the left side may not all be in order, but it must be smaller than the base element, and the right side must be larger than the base element. Then we take the benchmark as the center and divide it into two parts to quickly sort again:

image-20220905211741787

In this way, we can finally make the entire array in order. Of course, there are other ways of saying quick sort. Some of them are to find both the left and right parts and then exchange them. What we have here is to throw them away as soon as they are found. Now that the idea is clear, let's try to implement quick sort:

void quickSort(int arr[], int start, int end){
    
    
    if(start >= end) return;    //范围不可能无限制的划分下去,要是范围划得都没了,肯定要结束了
    int left = start, right = end, pivot = arr[left];   //这里我们定义两个指向左右两个端点的指针,以及取出基准
    while (left < right) {
    
         //只要两个指针没相遇,就一直循环进行下面的操作
        while (left < right && arr[right] >= pivot) right--;   //从右向左看,直到遇到比基准小的
        arr[left] = arr[right];    //遇到比基准小的,就丢到左边去
        while (left < right && arr[left] <= pivot) left++;   //从左往右看,直到遇到比基准大的
        arr[right] = arr[left];    //遇到比基准大的,就丢到右边去
    }
    arr[left] = pivot;    //最后相遇的位置就是基准存放的位置了
    quickSort(arr, start, left - 1);   //不包含基准,划分左右两边,再次进行快速排序
    quickSort(arr, left + 1, end);
}

In this way, we implement quick sort. Let's analyze the stability of quick sorting. Quick sorting is to directly exchange elements that are smaller or larger than the benchmark. For example, the original array is: 2, 2, 1. At this time, the first element is used as the benchmark. First, the right 1 will be thrown over and become: 1, 2, 1, and then from left to right, because it will only be changed when it encounters an element larger than the reference 2, so the final reference will be placed in the last position: 1, 2, 2. At this time, 2 that should have been in front has gone to the back. Therefore, the quick sort algorithm is an unstable sorting algorithm.

Dual-axis quick sort (optional)

Here we need to add an additional upgraded version of quick sort, dual-axis quick sort . The array tool class in the Java language uses this sorting method to sort large arrays. Let's take a look at what improvements it has made compared to quick sort. First of all, the ordinary quick sort algorithm may look like this when encountering extreme situations:

image-20220906131959909

The entire array happens to be in reverse order, so it is equivalent to searching the entire array first, and then putting 8 into the last position. At this time, the first round ends:

image-20220906132112592

Since 8 goes directly to the far right, there is no right half at this time, only the left half. At this time, the left half continues to be quickly sorted:

image-20220906132244369

At this time, 1 is the smallest element again, so when the traversal is finished, 1 is still at that position. At this time, there is no left half, only the right half:

image-20220906132344525

At this time, the benchmark is 7, which is the largest. It is really unlucky. After arranging, 7 went to the far left, and there is still no right half:

image-20220906132437765

We found that in this extreme case, each round needs to completely traverse the entire range, and each round will have a largest or smallest element pushed to both sides. Isn't this bubble sorting? Therefore, in extreme cases, quick sort will degenerate into bubble sort, so some quick sort will randomly select the reference element. In order to solve this problem that occurs in extreme cases, we can add another basis element, so that even if an extreme situation occurs, unless both sides are minimum elements or maximum elements, at least one basis can be segmented normally, and extreme situations occur The probability will also be reduced a lot:

image-20220906132945691

At this time, the first element and the last element are both used as reference elements, and the entire return is divided into three segments. Assuming that baseline 1 is smaller than baseline 2, then all elements stored in the first segment must be smaller than baseline 1, and all elements stored in the second segment must be smaller than baseline 1. It must be no less than base 1 and no greater than base 2. All elements stored in the third section must be greater than base 2:

image-20220906133219853

Therefore, after dividing into three segments, after each round of dual-axis quick sorting, the three segments need to be continued with dual-axis quick sorting. Finally, the entire array can be ordered. Of course, which quantities are this sorting algorithm more suitable for? For relatively large arrays, if the amount is relatively small, considering that dual-axis quick sort has to perform so many operations, it is actually not as fast as insertion sort.

Let's simulate how dual-axis quick sort works:

image-20220906140255444

First, take out the first element and the last element as two benchmarks, and then we need to compare them. If benchmark 1 is greater than benchmark 2, then the two benchmarks need to be exchanged first, but here because 4 is less than 6, there is no need to exchange.

At this point we need to create three pointers:

image-20220906140538076

Because there are three areas, the blue pointer position and the area to its left are both smaller than benchmark 1, the area from the left of the orange pointer to the blue pointer is not smaller than benchmark 1 and not larger than benchmark 2, the green pointer position and The area to the right is greater than benchmark 2, and the area between the orange pointer and the green pointer is the area to be sorted.

First, we start from the element pointed to by the orange pointer to judge, which can be divided into three situations:

  • If it is less than base 1, then you need to move the blue pointer backward first, swap the elements to the blue pointer, and then move the orange pointer backward.
  • If it is not less than base 1 and not greater than base 2, then you don’t need to do anything, just move the orange pointer forward, because it is within this range.
  • If it is greater than benchmark 2, then it needs to be thrown to the right. First move the right pointer to the left, and keep moving forward to find one that is not larger than benchmark 2, so that it can be smoothly exchanged.

First, let's take a look. At this time, the orange pointer points to 2, so 2 is less than the base 1. We need to move the blue pointer back first, and then exchange the elements on the orange and blue pointers, but here because they are the same One, so it remains unchanged. At this time, both pointers have moved back one position:

image-20220906141556398

Similarly, let's continue to look at the element pointed by the orange pointer. At this time, it is 7, which is greater than the base 2. Then we need to find an element on the right that is not greater than the base 2:

image-20220906141653453

The green pointer searches from the right to the left. At this time, it finds 3, and directly exchanges the orange pointer and blue pointer elements:

image-20220906141758610

In the next round, continue to look at the orange pointer element. At this time, it is found that it is smaller than the reference 1, so move the blue pointer forward first, and find that it and the orange are together again. The exchange is the same as not. At this time, both pointers are Moved back one position:

image-20220906141926006

The new round continues to look at the element pointed by the orange pointer. At this time, we find that 1 is also smaller than the baseline 1. First move the blue pointer, then swap, and then move the orange pointer. Same as above, swap the loneliness:

image-20220906142041202

At this time, the orange pointer points to 8, which is greater than base 2. Then you also need to find another one on the right that is not greater than base 2 for exchange:

image-20220906142134949

At this time, find 5, meet the conditions, and exchange:

image-20220906142205055

We continue to look at the orange pointer and find that the orange pointer element is not less than base 1 and not greater than base 2. Then according to the previous rules, we only need to move the orange pointer forward:

image-20220906142303329

At this time, the orange pointer and the green pointer collide, and there are no elements left to be sorted. Finally, we exchange the two reference elements located at both end points with the corresponding pointers. Reference 1 is exchanged with the blue pointer, and reference 2 is exchanged with the green pointer. :

image-20220906142445417

The three areas separated at this time just meet the conditions. Of course, with good luck here, the entire array will be in order. However, according to the normal route, we have to continue to perform dual-axis quick sorting on the remaining three areas. Finally, The sorting is complete.

Now let's try to write the code for dual-axis quick sort:

void dualPivotQuickSort(int arr[], int start, int end) {
    
    
    if(start >= end) return;     //首先结束条件还是跟之前快速排序一样,因为不可能无限制地分下去,分到只剩一个或零个元素时该停止了
    if(arr[start] > arr[end])    //先把首尾两个基准进行比较,看看谁更大
        swap(&arr[start], &arr[end]);    //把大的换到后面去
    int pivot1 = arr[start], pivot2 = arr[end];    //取出两个基准元素
    int left = start, right = end, mid = left + 1;   //因为分了三块区域,此时需要三个指针来存放
    while (mid < right) {
    
        //因为左边冲在最前面的是mid指针,所以说跟之前一样,只要小于right说明mid到right之间还有没排序的元素
        if(arr[mid] < pivot1)     //如果mid所指向的元素小于基准1,说明需要放到最左边
            swap(&arr[++left], &arr[mid++]);   //直接跟最左边交换,然后left和mid都向前移动
        else if (arr[mid] <= pivot2) {
    
        //在如果不小于基准1但是小于基准2,说明在中间
            mid++;   //因为mid本身就是在中间的,所以说只需要向前缩小范围就行
        } else {
    
        //最后就是在右边的情况了
            while (arr[--right] > pivot2 && right > mid);  //此时我们需要找一个右边的位置来存放需要换过来的元素,注意先移动右边指针
            if(mid >= right) break;   //要是把剩余元素找完了都还没找到一个比基准2小的,那么就直接结束,本轮排序已经完成了
            swap(&arr[mid], &arr[right]);   //如果还有剩余元素,说明找到了,直接交换right指针和mid指针所指元素
        }
    }
    swap(&arr[start], &arr[left]);    //最后基准1跟left交换位置,正好左边的全部比基准1小
    swap(&arr[end], &arr[right]);     //最后基准2跟right交换位置,正好右边的全部比基准2大
    dualPivotQuickSort(arr, start, left - 1);    //继续对三个区域再次进行双轴快速排序
    dualPivotQuickSort(arr, left + 1, right - 1);
    dualPivotQuickSort(arr, right + 1, end);
}

This part is optional only and is not compulsory.

Hill sort

Hill sorting is an advanced version of direct insertion sorting (Hill sorting is also called shrinking incremental sorting ). Although insertion sorting is easy to understand, in extreme cases it will cause all sorted elements to be moved back (for example, if you just want to insert (is a particularly small element)) In order to solve this problem, Hill sorting improves insertion sorting. It groups the entire array according to the step size, and compares farther elements first.

This step size is determined by an increment sequence. This increment sequence is very critical. A large number of studies have shown that when the increment sequence dlta[k] = 2^(t-k+1)-1(0<=k<=t<=(log2(n+1)))is , but for simplicity, we generally use n 2 \frac {n} {2}2n n 4 \frac {n} {4} 4n n 8 \frac {n} {8} 8n,...,1 such incremental sequence.

Assume the array length is N, the detailed process is:

  • First find the initial step size, which is n/2.
  • We group the entire array according to the step size, that is, two by two (if n is an odd number, the first group will have three elements)
  • We perform insertion sort within these groups respectively.
  • After the sorting is completed, we regroup the steps by /2 and repeat the above steps until the step size is 1 and the last pass of insertion sort is completed.

In this case, because the order in the group has been adjusted once, small elements are arranged as early as possible. Even if small elements are to be inserted during the last sorting, there will not be too many elements that need to be moved back. .

Let's take the following array as an example:

image-20220905223505975

First, the length of the array is 8. Divide 2 directly to get 34. Then the step size is 4. We group according to the step size of 4:

image-20220905223609936

Among them, 4 and 8 are the first group, 2 and 5 are the second group, 7 and 3 are the third group, and 1 and 6 are the fourth group. We perform insertion sorting within these four groups respectively. After sorting within the group, The result is:

image-20220905223659584

You can see that the current small elements are moving forward as much as possible, although they are not yet in order. Then we reduce the step size, 4/2=2, and divide according to this step size:

image-20220905223804907

At this time, 4, 3, 8, and 7 are a group, and 2, 1, 5, and 6 are a group. We continue to sort within these two groups and get:

image-20220905224111803

Finally, we continue to increase the step size/2 to get 2/2=1. At this time, the step size becomes 1, which is equivalent to the entire array being a group. We perform an insertion sort again. At this time, we will find that the small elements are It's to the left, and it will be very easy to perform insertion sort at this time.

Let’s try to write some code now:

void shellSort(int arr[], int size){
    
    
    int delta = size / 2;
    while (delta >= 1) {
    
    
        //这里依然是使用之前的插入排序,不过此时需要考虑分组了
        for (int i = delta; i < size; ++i) {
    
       //我们需要从delta开始,因为前delta个组的第一个元素默认是有序状态
            int j = i, tmp = arr[i];   //这里依然是把待插入的先抽出来
            while (j >= delta && arr[j - delta] > tmp) {
    
       
              	//注意这里比较需要按步长往回走,所以说是j - delta,此时j必须大于等于delta才可以,如果j - delta小于0说明前面没有元素了
                arr[j] = arr[j - delta];
                j -= delta;
            }
            arr[j] = tmp;
        }
        delta /= 2;    //分组插排完事之后,重新计算步长
    }
}

Although three levels of loop nesting are used here, the actual time complexity may be more than O ( n 2 ) O(n^2)O ( n2 )is still small, because it can ensure that small elements must move to the left, so the number of sorting is actually not as many as we imagined. Since the proof process is too complicated, it will not be listed here.

So is Hill sorting stable? Because we are now grouping by step size, it may cause two adjacent identical elements to be moved to the front in their own group. Therefore, Hill sorting is an unstable sorting algorithm .

Heap sort

Let's look at the last one. Heap sort is also a type of selection sort, but it can be faster than direct selection sort. Remember the big top pile and small top pile we explained earlier? Let’s review:

For a complete binary tree, if all the father nodes in the tree are smaller than the child nodes, we call it a small root heap (small top heap), and if all the father nodes in the tree are larger than the child nodes, it is a large root heap.

Thanks to the fact that the heap is a complete binary tree, we can easily use an array to represent it:

image-20220818110224673

By building a heap, we can input an unordered array in sequence, and the final sequence stored is a sequence arranged in order. Taking advantage of this property, we can easily use the heap for sorting. Let's first write a small Top pile:

typedef int E;
typedef struct MinHeap {
    
    
    E * arr;
    int size;
    int capacity;
} * Heap;

_Bool initHeap(Heap heap){
    
    
    heap->size = 0;
    heap->capacity = 10;
    heap->arr = malloc(sizeof (E) * heap->capacity);
    return heap->arr != NULL;
}

_Bool insert(Heap heap, E element){
    
    
    if(heap->size == heap->capacity) return 0;
    int index = ++heap->size;
    while (index > 1 && element < heap->arr[index / 2]) {
    
    
        heap->arr[index] = heap->arr[index / 2];
        index /= 2;
    }
    heap->arr[index] = element;
    return 1;
}

E delete(Heap heap){
    
    
    E max = heap->arr[1], e = heap->arr[heap->size--];
    int index = 1;
    while (index * 2 <= heap->size) {
    
    
        int child = index * 2;
        if(child < heap->size && heap->arr[child] > heap->arr[child + 1])
            child += 1;
        if(e <= heap->arr[child]) break;
        else heap->arr[index] = heap->arr[child];
        index = child;
    }
    heap->arr[index] = e;
    return max;
}

Then we only need to insert these elements into the heap one by one, and then take them out one by one. What we get is an ordered sequence:

int main(){
    
    
    int arr[] = {
    
    3, 5, 7, 2, 9, 0, 6, 1, 8, 4};

    struct MinHeap heap;    //先创建堆
    initHeap(&heap);
    for (int i = 0; i < 10; ++i)
        insert(&heap, arr[i]);   //直接把乱序的数组元素挨个插入
    for (int i = 0; i < 10; ++i)
        arr[i] = delete(&heap);    //然后再一个一个拿出来,就是按顺序的了

    for (int i = 0; i < 10; ++i)
        printf("%d ", arr[i]);
}

The final result is:

image-20220906001134488

Although this is simpler to use, it requires additional O ( n ) O(n)O ( n ) space is used as a heap, so we can further optimize it and reduce its space occupation. So how to optimize? We might as well change our thinking and directly construct the heap for the given array.

Assume the array length is N, the detailed process is:

  • First resize the given array into a large top heap
  • Perform N rounds of selection, each time selecting the element at the top of the big top heap and storing it forward from the end of the array (exchanging the top of the heap and the last element of the heap)
  • After the exchange is completed, re-adjust the root node of the heap so that it continues to meet the properties of a large top heap, and then repeat the above operations.
  • When the N rounds are over, what you get is an array arranged from small to large.

We first turn the given array into a complete binary tree, taking the following array as an example:

image-20220906220020172

At this point, this binary tree is not a heap yet, and our primary goal is to turn it into a large top heap. So how to turn this binary tree into a big top heap? We only need to make adjustments starting from the last non-leaf node (in order from top to bottom). For example, 1 is the last non-leaf node at this time, so starting from 1, we need to compare. If it If the child node is larger than it, then the largest child needs to be exchanged. At this time, its child node 6 is greater than 1, so it needs to be exchanged:

image-20220906221306519

Then let’s look at the penultimate non-leaf node, which is 7. At this time, both children are smaller than it, so no adjustment is needed. Let’s next look at the penultimate non-leaf node 2. , at this time, the two children of 2, 6 and 8, are both greater than 2, then we choose the largest of the two children to exchange:

image-20220906221504364

Finally, the only non-leaf node left is the root node. At this time, the left and right children of our 4 are greater than 4, so adjustments still need to be made:

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-87a7s1F4-1662545089947)(/Users/nagocoler/Library/Application Support/typora-user-images/image-20220906221657599 .png)]

After the adjustment, it is not over yet, because after replacing 4 at this time, it still does not meet the properties of a large top heap. At this time, the left child of 4 is greater than 4, and we need to continue to look down:

image-20220906221833012

After the exchange, the entire binary tree now satisfies the properties of a large top heap, and our first initial adjustment is completed.

At this point, we start the second step. We need to exchange the top elements of the heap one by one, which is equivalent to taking out the largest one every time until it is finished. First, exchange the top element of the heap and the last element:

image-20220906222327297

At this time, the largest element in the entire array has been arranged at the corresponding position, and then we no longer consider the last element. At this time, the remaining elements in front continue to be regarded as a complete binary tree, and the root node is heaped again. (Only the root node needs to be adjusted, because other non-leaf nodes have not changed), so that it continues to meet the properties of a large top heap:

image-20220906222819554

It’s not over yet, continue to adjust:

image-20220906222858752

At this time, the first round is over, and then the second round is repeated. The above operation is repeated. First, the top element of the heap is still thrown to the second to last position, which is equivalent to placing the second to last largest element to the corresponding position:

image-20220906222934602

At this time, two elements have been sorted. Similarly, we continue to regard the remaining elements as a complete binary tree, and continue to perform heap operations on the root node so that it continues to satisfy the large top heap property:

image-20220906223110734

In the third round, the same idea is used, and the largest one is swapped to the back:

image-20220906223326135

After N rounds of sorting, each element can finally be arranged in the corresponding position. Based on the above ideas, let's try to write some code:

//这个函数就是对start顶点位置的子树进行堆化
void makeHeap(int* arr, int start, int end) {
    
    
    while (start * 2 + 1 <= end) {
    
        //如果有子树,就一直往下,因为调整之后有可能子树又不满足性质了
        int child = start * 2 + 1;    //因为下标是从0开始,所以左孩子下标就是i * 2 + 1,右孩子下标就是i * 2 + 2
        if(child + 1 <= end && arr[child] < arr[child + 1])   //如果存在右孩子且右孩子比左孩子大
            child++;    //那就直接看右孩子
        if(arr[child] > arr[start])   //如果上面选出来的孩子,比父结点大,那么就需要交换,大的换上去,小的换下来
            swap(&arr[child], &arr[start]);
        start = child;   //继续按照同样的方式前往孩子结点进行调整
    }
}

void heapSort(int arr[], int size) {
    
    
    for(int i= size/2 - 1; i >= 0; i--)   //我们首选需要对所有非叶子结点进行一次堆化操作,需要从最后一个到第一个,这里size/2计算的位置刚好是最后一个非叶子结点
        makeHeap(arr, i, size - 1);
    for (int i = size - 1; i > 0; i--) {
    
       //接着我们需要一个一个把堆顶元素搬到后面,有序排列
        swap(&arr[i], &arr[0]);    //搬运实际上就是直接跟倒数第i个元素交换,这样,每次都能从堆顶取一个最大的过来
        makeHeap(arr, 0, i - 1);   //每次搬运完成后,因为堆底元素被换到堆顶了,所以需要再次对根结点重新进行堆化
    }
}

Finally, let's analyze the stability of heap sorting. In fact, heap sorting itself is also making selections. Every time, the top element of the heap is selected and placed at the back, but the heap is always dynamically maintained. In fact, when elements are taken out from the top of the heap, they will be exchanged with the leaves below, which may occur:

image-20220906223706019

Therefore, heap sort is an unstable sorting algorithm.

Finally, let us summarize the relevant properties of the above three sorting algorithms:

Sorting Algorithm best case scenario worst case scenario space complexity stability
Quick sort O ( n l o g n ) O(nlogn) O(nlogn) O ( n 2 ) O(n^2)O ( n2) O ( l o g n ) O(logn) O(logn) unstable
Hill sort O ( n 1.3 ) O(n^{1.3})O ( n1.3) O ( n 2 ) O(n^2)O ( n2) O ( 1 ) O(1)O(1) unstable
Heap sort O ( n l o g n ) O(nlogn)O(nlogn) O ( n l o g n ) O(nlogn)O(nlogn) O ( 1 ) O(1)O(1) unstable

Other sorting options

In addition to the several sorting algorithms we introduced earlier, there are also some other types of sorting algorithms. Let’s take a look at them all.

merge sort

Merge sort uses the idea of ​​recursive divide and conquer to divide the original array, then first sort the divided small arrays, and then finally merge them into an ordered large array. It is still easy to understand:

image-20220906232451040

Let's take the following array as an example:

image-20220905223505975

Let’s not rush to sort at the beginning. Let’s divide it half and half:

image-20220907135544173

Continue to divide:

image-20220907135744253

In the end, it will become elements one by one like this:

image-20220907135927289

At this point we can start merging and sorting. Note that the merging here is not a simple merging. We need to merge each element in order from small to large. The first group of trees 4 and 2. At this time we You need to select the smaller one from these two arrays and move it to the front:

image-20220907140219455

After sorting is complete, we continue merging upward:

image-20220907141217008

Finally we merge the two arrays back to their original size:

image-20220907141442229

Finally, you will get an ordered array.

In fact, this sorting algorithm is also very efficient, but it needs to sacrifice a space of the size of the original array to sort the decomposed data. The code is as follows:

void merge(int arr[], int tmp[], int left, int leftEnd, int right, int rightEnd){
    
    
    int i = left, size = rightEnd - left + 1;   //这里需要保存一下当前范围长度,后面使用
    while (left <= leftEnd && right <= rightEnd) {
    
       //如果两边都还有,那么就看哪边小,下一个就存哪一边的
        if(arr[left] <= arr[right])   //如果左边的小,那么就将左边的存到下一个位置(这里i是从left开始的)
            tmp[i++] = arr[left++];   //操作完后记得对i和left都进行自增
        else
            tmp[i++] = arr[right++];
    }
    while (left <= leftEnd)    //如果右边看完了,只剩左边,直接把左边的存进去
        tmp[i++] = arr[left++];
    while (right <= rightEnd)   //同上
        tmp[i++] = arr[right++];
    for (int j = 0; j < size; ++j, rightEnd--)   //全部存到暂存空间中之后,暂存空间中的内容都是有序的了,此时挨个搬回原数组中(注意只能搬运范围内的)
        arr[rightEnd] = tmp[rightEnd];
}

void mergeSort(int arr[], int tmp[], int start, int end){
    
       //要进行归并排序需要提供数组和原数组大小的辅助空间
    if(start >= end) return;   //依然是使用递归,所以说如果范围太小,就不用看了
    int mid = (start + end) / 2;   //先找到中心位置,一会分两半
    mergeSort(arr, tmp, start, mid);   //对左半和右半分别进行归并排序
    mergeSort(arr, tmp, mid + 1, end);
    merge(arr, tmp, start, mid, mid + 1, end);  
  	//上面完事之后,左边和右边都是有序状态了,此时再对整个范围进行一次归并排序即可
}

Because merge sort is also merged according to the small priority in the end. If it encounters equality, the first one will be thrown back to the original array first, so the first one is still ranked first, so merge sort is also a stable sorting algorithm .

Bucket sort and radix sort

Before we start explaining bucket sorting, let's first take a look at counting sorting. It requires that the length of the array is N, and the value range of the elements in the array is between 0 - M-1 (M is less than or equal to N)

Algorithm demonstration website: https://visualgo.net/zh/sorting?slide=1

For example, in the following array, all elements range from 1 to 6:

image-20220907142933725

We first traverse it and count the number of occurrences of each element. After the statistics are completed, we can know where to store the elements with what values ​​​​after sorting:

image-20220907145336855

Let's analyze it. First of all, there is only one 1, so it will only occupy one position. There is only one 2, so it will only occupy one position, and so on:

image-20220907145437992

Therefore, we can directly fill in these values ​​one by one based on the statistical results, and they are still stable. Just fill in a few in order:

image-20220907145649061

Doesn’t it feel very simple, and it only needs to be traversed once for statistics?

Of course there are definitely disadvantages:

  1. When the difference between the maximum and minimum values ​​in the array is too large, we have to apply for more space for counting, so it is not suitable for counting sorting.
  2. When the element values ​​in the array are not discrete (that is, not integers), there is no way to count them.

Let's next look at bucket sorting, which is an extension of counting sorting and the idea is relatively simple. It also requires that the length of the array is N, and the value range of the elements in the array is between 0 - M-1 (M is less than or equal to N) , for example, there are now 1,000 students, and now we need to sort these students according to their scores. Because the score range is 0-100, we can create 101 buckets for classified storage.

For example, the following array:

image-20220907142933725

This array contains elements 1-6, so we can create 6 buckets for statistics:

image-20220907143715938

In this way, we only need to traverse once to classify all the elements and throw them into these buckets. Finally, we only need to traverse these buckets in sequence, and then take out the elements and store them back in order to get an ordered array:

image-20220907144255326

However, although bucket sorting is also very fast, it also has the same limitations as the counting sorting above. We can reduce the number of buckets by accepting elements within a certain range in each bucket, but this will cause additional time overhead.

Let's finally look at radix sorting. Radix sorting is still a sorting algorithm that relies on statistics, but it will not cause unlimited application for auxiliary space because the range is too large. The idea is to separate out 10 base numbers (from 0 - 9). We still only need to traverse once. We classify according to the number in the ones digit of each element, because now there are 10 base numbers, which is 10 A bucket. After the ones are finished, look at the tens and hundreds...

Algorithm demonstration website: https://visualgo.net/zh/sorting

image-20220907152403435

First count by single digits, then sort, then count by tens, and then sort. The final result is the final result:

image-20220907152903020

Then comes the tens digit:

image-20220907153005797

Finally, take them out again in order:
image-20220907153139536

Successfully obtained ordered array.

Finally, let's summarize the relevant properties of all sorting algorithms:

Sorting Algorithm best case scenario worst case scenario space complexity stability
Bubble Sort O ( n ) O(n)O ( n ) O ( n 2 ) O(n^2)O ( n2) O ( 1 ) O(1)O(1) Stablize
insertion sort O ( n ) O(n)O ( n ) O ( n 2 ) O(n^2)O ( n2) O ( 1 ) O(1)O(1) Stablize
selection sort O ( n 2 ) O(n^2)O ( n2) O ( n 2 ) O(n^2)O ( n2) O ( 1 ) O(1)O(1) unstable
Quick sort O ( n l o g n ) O(nlogn) O(nlogn) O ( n 2 ) O(n^2)O ( n2) O ( l o g n ) O(logn) O(logn) unstable
Hill sort O ( n 1.3 ) O(n^{1.3})O ( n1.3) O ( n 2 ) O(n^2)O ( n2) O ( 1 ) O(1)O(1) unstable
Heap sort O ( n l o g n ) O(nlogn) O(nlogn) O ( n l o g n ) O(nlogn)O(nlogn) O ( 1 ) O(1)O(1) unstable
merge sort O ( n l o g n ) O(nlogn)O(nlogn) O ( n l o g n ) O(nlogn)O(nlogn) O ( n ) O(n)O ( n ) Stablize
counting sort O ( n + k ) O(n + k) O ( n+k) O ( n + k ) O(n + k)O ( n+k) O ( k ) O(k)O ( k ) Stablize
bucket sort O ( n + k ) O(n + k)O ( n+k) O ( n 2 ) O(n^2)O ( n2) O ( k + n ) O(k + n) O ( k+n) Stablize
Radix sort O ( n × k ) O(n \times k) O ( n×k) O ( n × k ) O(n \times k)O ( n×k) O ( k + n ) O(k+n) O ( k+n) Stablize

monkey sorting

Monkey sorting is more Buddhist, because it all depends on luck as to when you can finish the sorting!

The infinite monkey theorem was first mentioned by Emile Borrell in a book on probability published in 1909, which introduced the concept of "typing monkeys". The Infinite Monkey Theorem is an example of Kolmogorov's zero-uniformity proposition in probability theory. The general meaning is that if you let a monkey press keys randomly on a typewriter, and if you keep pressing them like this, as long as the time reaches infinity, the monkey will almost certainly be able to type any given text, even Shakespeare's The complete set of works can also be typed.

Suppose there is an array of length N:

image-20220907154254943

Each time we randomly select an element from the array and exchange it with a random element:

image-20220907154428792

As long as you are lucky enough, you might be able to get it done in just a few times. If you are unlucky, you might not be able to arrange your marriage until your grandson is married.

code show as below:

_Bool checkOrder(int arr[], int size){
    
    
    for (int i = 0; i < size - 1; ++i)
        if(arr[i] > arr[i + 1]) return 0;
    return 1;
}

int main(){
    
    
    int arr[] = {
    
    3,5, 7,2, 9, 0, 6,1, 8, 4}, size = 10;

    int counter = 0;
    while (1) {
    
    
        int a = rand() % size, b = rand() % size;
        swap(&arr[a], &arr[b]);
        if(checkOrder(arr, size)) break;
        counter++;
    }
    printf("在第 %d 次排序完成!", counter);
}

It can be seen that with 10 elements, the 7485618th sorting was successful:

image-20220907160219493

But I don’t know why the results of sorting are the same every time. Maybe the random numbers are not random enough.

Sorting Algorithm best case scenario worst case scenario space complexity stability
monkey sorting O ( 1 ) O(1)O(1) O ( 1 ) O(1)O(1) unstable

Guess you like

Origin blog.csdn.net/qq_25928447/article/details/126751213