Thoughts on the connection and progression of the ten sorting algorithms


Preface

Sorting numbers on a small scale is a very easy task, and all the friends who have learned arithmetic can easily deal with it. But when the data scale reaches a certain level, how to sort quickly and efficiently becomes an interesting challenge. Often in large data volume application scenarios, the sorting algorithm can give full play to its power.
Insert picture description here
The following content will try our best to discuss all sorting algorithms in a simple way. First of all, the following points need to be clarified:

  • No sorting algorithm is optimal in all situations.
  • This article only discusses sorting based on comparison , whether it is characters or numbers, there needs to be comparison rules.
  • Stability means that the relative position of any two equal data remains unchanged before and after sorting

Sorting Algorithm

Bubble Sort

Each time two adjacent elements are compared, the exchange is performed according to the comparison rules, and each time the largest or smallest one is switched to one side, the problem size changes from n to n-1.
Insert picture description here
The above code is relatively simple, here are three instructions:

  1. The function of the flag is to judge whether there is an exchange in the bubble sorting again. If there is no exchange, it means that the order has been arranged, and there is no need to continue the rest of the loop judgment, and it can end directly.
  2. Time complexity: Best-O(N) has been sorted from the beginning, and it is completed after traversal.
    Worst-O(N^2) Complete reverse order, two nested loops.
  3. The advantage is that it can be applied to a linked list structure and is stable.

Insertion sort

Insert picture description here
Insertion sorting can be understood as the process of drawing cards and playing cards. We always compare the new cards one by one and find the right position and insert them. In the same way, insertion sort is the sorting algorithm abstracted from this test.
Insert picture description here
Time complexity and stability are the same as bubble sort.

Summarizing the above two sorting algorithms, first give a concept of "inversion pair". For subscript i<j, if A[i]> A[j], then (i,j) is a pair of inversion pairs (inversion ). Therefore, regardless of the choice of sorting or insertion sorting, it is necessary to eliminate reversed pairs. The exchange of adjacent elements in bubble sorting just eliminates one reversed pair. In insertion sort, if a sequence is basically ordered, insertion sort is simple and efficient. An important conclusion is drawn: any algorithm that only exchanges two elements to sort has an average time complexity of Ω (N 2) \Omega(N^2)Ω ( N2 ), so we must hurry up the algorithm, each exchange must eliminate more than one reverse order pair.

Hill sort

Since we want to eliminate more inverted pairs in one exchange, we can try to increase the range of the two numbers of this exchange, so that it is possible to eliminate multiple inverted pairs at a time, resulting in better algorithm time complexity , And retain the simplicity of insertion sort. Insert picture description here
The original Hill order can be understood by the following code.
Insert picture description here
The core is to insert and sort the sub-sequences taken at equal intervals, and the interval will be decremented to 1 according to the incremental sequence, which can ensure complete validity. But this time complexity is still not ideal, how to improve it?

Improve incremental sequence!

It has always been half before. Here are a variety of (funky incremental sequences, if you are interested, you can go to the source paper) to improve:
Insert picture description here

Select sort

The pseudo code of the selection sort is as follows, which is easy to understand:
Insert picture description here

Heap sort

Insert picture description here
Note that the last part of the code is to assign the temp value in the extra space back to the original array.

Merge sort

Regarding the merge sort, we must grasp a core, that is , the merge of ordered sub-columns .
Insert picture description here
For example, in the above figure, the A sub-column and the B sub-column are merged into a large ordered sub-column, you need to constantly move the pointer and compare, because each element has to be scanned once, so the time complexity is O (N) O(N )O ( N ) .
The code implementation idea is to set two pointers to the left and right subsequences to be sorted respectively. The smaller one can be moved into the array storing the result first, and the pointer can be moved to the right until all elements in one of the two subsequences are scanned. , The remaining part of another subsequence can be directly moved into the result array. Note that the last trick to assign back to the original array is to assign from right to left. Think about why not from left to right?
The pseudo code is as follows:Insert picture description here
The following question is how to realize the above-mentioned merger idea?

  1. It is not difficult to think that the first one can be realized by the divide and conquer + recursion algorithm, that is, the original array is continuously divided to deal with its sub-problems. The code is implemented as follows. Note that the Merge function we wrote above is called.
    Insert picture description here
    The time complexity can be derived from the classic divide-and-conquer recursive derivation, and it should be a number that is in line with expectations.
    Insert picture description here
    You can derive it manually to deepen your memory.
    Insert picture description here
    Note that he is O(NlgN) under any circumstances and is stable. (Why? Consider the code above.)

  2. How to implement non-recursive algorithms?

The idea is to merge two subsequences of length each time, and length increases from 1 until they are merged into a result sequence.
Please consider his time complexity with the following figure?
Insert picture description here
Actually this picture is somewhat misleading, the correct answer is O (N) O(N)O ( N ) , because you only need to open two arrays of length N and assign values ​​back and forth. Think about this process carefully.
The code is implemented as shown in the figure
Insert picture description here
below: The external interface of merge sort is as follows:


Pay attention to the part of the red box, which actually realizes the process of using the space of the result array and the space of the temporary array back and forth.

Here we have an overall feeling, oh, the merger is good, its worst and average complexity are both O (N lg N) O(NlgN)O ( N l g N ) and it is stable. Isn’t this a perfect sorting algorithm? But is this the case? By the way, it needs an extra space and needs to copy back and forth between arrays, so it is not commonly used in inner sorting, and it is mostly used in outer sorting.

Quick sort

Next, it is now the most commonly used and recognized as the fastest sorting algorithm. Yes, it is fast sorting. Regarding quick sorting, I wrote a more vivid article in the form of dialogue. Please move here . You can refer to the figure below for his thoughts. It doesn’t feel very complicated, but the selection of many values ​​in his implementation process must be very careful. If you don’t pay attention, the performance will be greatly reduced. For details, please see my article on fast Row of articles.
Insert picture description here
Here is a summary of the points of attention:

  1. The best case of this kind of fast sorting is to select the middle number (middle) each time, and the time complexity is O (N lg N) O(NlgN)O ( N l g N )
  2. How to choose the pivot? Can I directly select the first element as pivot? What is the worst case? Insert picture description here
    As expected, the effect exploded. How to improve it?
    That's right, a bunch of people have thought about this problem for a long time. At present, the most commonly used method of choosing a pivot is to take the median of the first, middle, and last three numbers as the pivot. The code is as follows (think about it, is it okay to take a random number?)
    Insert picture description here
    Thinking about the last two lines of the above code, his idea is that since he knows that the middle number is smaller than right, why don't I put it directly before right, so that it can save Remove the overhead of comparing the two elements at the beginning and end.
  3. How to divide the subset?

Insert picture description here
The principle of fast sorting will not be repeated here, but its core, and the advantage over insertion sorting, is that the position of each pivot insertion (exchange) is its final position, and there is no need to move it. At the same time, the array is divided into subsets.
Then, thinking about a detailed question is also a question that interviewers often examine in depth.
"If during the sorting process, there is exactly an element equal to the pivot pivot, how should we deal with it?" {\color{red} "If during the sorting process, there is exactly an element equal to the pivot pivot, how should we deal with it?"}" As If the row order through the process of , the positive well has membered prime the like to the main element P I V O T , should be the as how the processing of it ? "
At this time no more than two choices, to stop the exchange, or ignore.
For the first method, considering extreme cases, if all numbers are equal, there will be many useless comparisons and exchanges, but the advantage is that thepivotcan be stopped in the middle, and the time complexity can reachO (N lg N) O(NlgN)O ( N l g N ) .
For the second method, invalid exchange is avoided in extreme cases, but the time complexity isO (N 2) O(N^2)O ( N2 ). So, emmm. . You know, we should choose the first one.
The disadvantage of quick sort is that it uses recursion, which is not friendly enough for large-scale data, so when the scale is sufficiently small, simple sorting can be used to solve problems, such as selective sorting.
Finally, the code is given:

Insert picture description here

Note that the cutoff here is the data size threshold that you set to use selective sorting, and the red box is the interface called from outside.

Table sorting

The idea is not to move the data (key) itself, but to move the pointer (table subscript) to sort, which is an indirect sort.
Insert picture description here

Bucket sort

Just exchange elements based on the comparison size, the worst time complexity is O (N lg N) O(NlgN)O ( N l g N ) . So can you do something else while swapping? (Grandma's beauty crit [laugh cry])
Insert picture description here

Base sort

Cardinality sort is an improved version of bucket sort.
Insert picture description here
P refers to the number of scans, at the level of lg (N). If the number of buckets is small enough, it can be sorted in linear time.

to sum up

Explanation:
The reason why the selection sort is unstable is that it is possible to reverse the sequence of equal numbers by jumping and swapping. For a simple example, you can know whether it is stable... For example: (7) 2 5 9 3 4 [7] 1...When When we use the direct selection sorting algorithm to sort, (7) and 1 are exchanged, and (7) ran to the back of [7]. The original order was changed, which made it unstable.
The value of d for Hill sorting depends on the choice of incremental sequence.
The average and worst of heap sort and merge sort are O (N lg N) O(NlgN)O ( N l g N ) , but the disadvantage of merging is that it requires an extra array space to "upside down" the array. The advantage of merging is stability.
Quicksort is unstable (hop and swap, I won't repeat it), but because it is recursive, it needs extra stack space.

Insert picture description here


Note:
All the above pictures are from Zhejiang University Data Structure Open Courseware ppt, source address

Guess you like

Origin blog.csdn.net/weixin_41896265/article/details/108414565